Reinforcement learning algorithms: A brief survey
•RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate goal of an RL agent is to maximize cumulative reward.•RL agent tries to learn the optimal value and policy functions.•DNN-based function appro...
Saved in:
| Published in: | Expert systems with applications Vol. 231; p. 120495 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
30.11.2023
|
| Subjects: | |
| ISSN: | 0957-4174, 1873-6793 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | •RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate goal of an RL agent is to maximize cumulative reward.•RL agent tries to learn the optimal value and policy functions.•DNN-based function approximation is used to approximate the value and policy.
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge obtained by continuous interaction with a stochastic dynamical environment. Problems considered virtually impossible to solve, such as learning to play video games just from pixel information, are now successfully solved using deep reinforcement learning. Without human intervention, RL agents can surpass human performance in challenging tasks. This review gives a broad overview of RL, covering its fundamental principles, essential methods, and illustrative applications. The authors aim to develop an initial reference point for researchers commencing their research work in RL. In this review, the authors cover some fundamental model-free RL algorithms and pathbreaking function approximation-based deep RL (DRL) algorithms for complex uncertain tasks with continuous action and state spaces, making RL useful in various interdisciplinary fields. This article also provides a brief review of model-based and multi-agent RL approaches. Finally, some promising research directions for RL are briefly presented. |
|---|---|
| AbstractList | •RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate goal of an RL agent is to maximize cumulative reward.•RL agent tries to learn the optimal value and policy functions.•DNN-based function approximation is used to approximate the value and policy.
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge obtained by continuous interaction with a stochastic dynamical environment. Problems considered virtually impossible to solve, such as learning to play video games just from pixel information, are now successfully solved using deep reinforcement learning. Without human intervention, RL agents can surpass human performance in challenging tasks. This review gives a broad overview of RL, covering its fundamental principles, essential methods, and illustrative applications. The authors aim to develop an initial reference point for researchers commencing their research work in RL. In this review, the authors cover some fundamental model-free RL algorithms and pathbreaking function approximation-based deep RL (DRL) algorithms for complex uncertain tasks with continuous action and state spaces, making RL useful in various interdisciplinary fields. This article also provides a brief review of model-based and multi-agent RL approaches. Finally, some promising research directions for RL are briefly presented. |
| ArticleNumber | 120495 |
| Author | Chakrabarty, Sohom Pillai, Gopinatha Shakya, Ashish Kumar |
| Author_xml | – sequence: 1 givenname: Ashish Kumar surname: Shakya fullname: Shakya, Ashish Kumar email: akumarshakya@ee.iitr.ac.in – sequence: 2 givenname: Gopinatha surname: Pillai fullname: Pillai, Gopinatha email: gn.pillai@ee.iitr.ac.in – sequence: 3 givenname: Sohom orcidid: 0000-0001-7213-6693 surname: Chakrabarty fullname: Chakrabarty, Sohom email: sohom.chakrabarty@ee.iitr.ac.in |
| BookMark | eNp9z81KAzEUhuEgFWyrN-BqbmDGk5-ZTMRNKVqFgiC6DmnmpKZMM5LESu_elrpy0dVZPYfvnZBRGAISckuhokCbu02F6cdUDBivKAOh6gsypq3kZSMVH5ExqFqWgkpxRSYpbQCoBJBjQt_QBzdEi1sMuejRxODDujD9eog-f27TfTErVtGjK9J33OH-mlw60ye8-btT8vH0-D5_Lpevi5f5bFlaDpBLVF0rW1cbzpnqaNPUAAIpUlAts0pxbnlHlVtZ7CSXwgFldd0KUKLGxjE-Jez018YhpYhOf0W_NXGvKehjtN7oY7Q-RutT9AG1_5D12WQ_hByN78_ThxPFQ9TOY9TJegyHeT6izbob_Dn-CxSgc0s |
| CitedBy_id | crossref_primary_10_1016_j_eswa_2024_125116 crossref_primary_10_3390_app15052624 crossref_primary_10_3390_jimaging11020059 crossref_primary_10_3390_fi16120460 crossref_primary_10_1016_j_eswa_2024_124820 crossref_primary_10_3390_math13010173 crossref_primary_10_1109_ACCESS_2025_3590134 crossref_primary_10_1109_ACCESS_2024_3452190 crossref_primary_10_1016_j_enbuild_2025_116349 crossref_primary_10_1016_j_swevo_2024_101759 crossref_primary_10_1016_j_eswa_2025_127251 crossref_primary_10_1109_JSEN_2024_3483192 crossref_primary_10_1016_j_compeleceng_2025_110634 crossref_primary_10_3390_s24082461 crossref_primary_10_1051_itmconf_20257804023 crossref_primary_10_1016_j_ins_2024_120805 crossref_primary_10_1016_j_asoc_2025_112993 crossref_primary_10_3389_fspor_2024_1440652 crossref_primary_10_1007_s10586_024_04381_y crossref_primary_10_3390_a17060269 crossref_primary_10_3390_jmse12122214 crossref_primary_10_3390_technologies12090163 crossref_primary_10_1007_s10586_025_05424_8 crossref_primary_10_1016_j_conengprac_2025_106491 crossref_primary_10_26599_BDMA_2025_9020036 crossref_primary_10_3390_su151813668 crossref_primary_10_1177_00131644251332972 crossref_primary_10_1109_TCSS_2024_3505205 crossref_primary_10_1007_s11071_024_10533_x crossref_primary_10_3390_jrfm18070347 crossref_primary_10_1007_s10586_025_05232_0 crossref_primary_10_1061_JAEEEZ_ASENG_6129 crossref_primary_10_1016_j_engappai_2025_110091 crossref_primary_10_1016_j_cie_2025_111185 crossref_primary_10_1051_epjconf_202532601006 crossref_primary_10_1016_j_tws_2025_113756 crossref_primary_10_1016_j_iot_2025_101560 crossref_primary_10_1016_j_ress_2024_110466 crossref_primary_10_1109_TIP_2025_3592538 crossref_primary_10_1080_02508060_2025_2516299 crossref_primary_10_1002_nme_70110 crossref_primary_10_1016_j_eswa_2024_124951 crossref_primary_10_3389_fmars_2025_1629563 crossref_primary_10_1016_j_knosys_2025_114070 crossref_primary_10_1016_j_biosystems_2025_105457 crossref_primary_10_1049_cth2_70021 crossref_primary_10_3390_s24237787 crossref_primary_10_1016_j_rineng_2025_106739 crossref_primary_10_1016_j_oceaneng_2024_119304 crossref_primary_10_3390_app15063313 crossref_primary_10_1016_j_jmapro_2025_08_052 crossref_primary_10_1108_IR_01_2025_0015 crossref_primary_10_1016_j_cie_2025_110889 crossref_primary_10_1016_j_ymssp_2025_112770 crossref_primary_10_1080_00207543_2024_2428426 crossref_primary_10_1016_j_neunet_2025_107260 crossref_primary_10_3390_s25010211 crossref_primary_10_1007_s00170_024_13704_7 crossref_primary_10_1088_2058_9565_ad80c1 crossref_primary_10_3390_jmse12060998 crossref_primary_10_1016_j_jmsy_2024_06_009 crossref_primary_10_1007_s11705_024_2487_0 crossref_primary_10_1016_j_ijpe_2025_109601 crossref_primary_10_1093_bioadv_vbaf142 crossref_primary_10_3390_electronics14183711 crossref_primary_10_1049_sil2_6422115 crossref_primary_10_1007_s00170_024_13874_4 crossref_primary_10_1007_s40593_025_00494_6 crossref_primary_10_1109_TKDE_2025_3546686 crossref_primary_10_1016_j_segan_2024_101356 crossref_primary_10_3390_fi17010019 crossref_primary_10_1007_s10010_025_00814_1 crossref_primary_10_1016_j_eswa_2024_126365 crossref_primary_10_3390_en18195056 crossref_primary_10_4218_etrij_2024_0339 crossref_primary_10_1016_j_eswa_2024_125388 crossref_primary_10_3390_biomimetics10080497 crossref_primary_10_1016_j_renene_2024_121265 crossref_primary_10_3390_s24072035 crossref_primary_10_3390_biology13110923 crossref_primary_10_1016_j_eswa_2025_127740 crossref_primary_10_1016_j_vehcom_2025_100913 crossref_primary_10_3390_pr13072207 crossref_primary_10_1016_j_asoc_2025_113450 crossref_primary_10_3390_pr13061791 crossref_primary_10_1007_s43926_025_00092_x crossref_primary_10_1007_s10115_024_02162_y crossref_primary_10_3390_smartcities8010005 crossref_primary_10_3390_fluids10080193 crossref_primary_10_1080_17477778_2025_2549092 crossref_primary_10_1016_j_eswa_2025_128280 crossref_primary_10_1016_j_rcim_2024_102942 crossref_primary_10_3390_app15010179 crossref_primary_10_1016_j_egyai_2025_100521 crossref_primary_10_1016_j_jobe_2025_112626 crossref_primary_10_3390_robotics13110166 crossref_primary_10_1016_j_eswa_2024_125963 crossref_primary_10_32604_cmc_2024_056823 crossref_primary_10_1063_5_0272428 crossref_primary_10_1016_j_eswa_2025_128614 crossref_primary_10_1016_j_engappai_2025_110181 crossref_primary_10_1016_j_engappai_2024_108599 crossref_primary_10_3390_fermentation10120598 crossref_primary_10_1016_j_swevo_2025_102080 crossref_primary_10_1016_j_segan_2025_101785 crossref_primary_10_1051_jnwpu_20254310128 crossref_primary_10_1016_j_ins_2025_122514 crossref_primary_10_1016_j_bcra_2025_100387 crossref_primary_10_1016_j_ifacol_2024_09_176 crossref_primary_10_1016_j_jnca_2024_104092 crossref_primary_10_1111_risa_17599 crossref_primary_10_3390_drones9080521 crossref_primary_10_1038_s41598_025_98572_1 crossref_primary_10_1145_3663366 crossref_primary_10_1016_j_eswa_2025_127659 crossref_primary_10_3390_app15179435 crossref_primary_10_1145_3730848 crossref_primary_10_1007_s11432_025_4544_x crossref_primary_10_1016_j_apenergy_2023_122029 crossref_primary_10_1109_ACCESS_2025_3607976 crossref_primary_10_1016_j_eswa_2024_126164 crossref_primary_10_1016_j_biotechadv_2024_108480 crossref_primary_10_3390_en18184995 crossref_primary_10_1016_j_jmrt_2025_07_005 crossref_primary_10_3390_aerospace12050411 crossref_primary_10_1007_s11227_024_06167_w crossref_primary_10_1016_j_jobe_2025_114074 crossref_primary_10_1109_TSG_2024_3458074 crossref_primary_10_1007_s42835_024_02086_1 crossref_primary_10_1016_j_jatrs_2025_100077 crossref_primary_10_1007_s10489_024_06149_8 crossref_primary_10_1109_ACCESS_2024_3442445 crossref_primary_10_3390_fi16090343 crossref_primary_10_1002_widm_1548 crossref_primary_10_3390_rs17071175 crossref_primary_10_1109_ACCESS_2024_3444189 crossref_primary_10_1371_journal_pone_0320777 crossref_primary_10_1002_cta_4235 crossref_primary_10_1088_2515_7620_adf530 crossref_primary_10_1016_j_knosys_2025_114483 crossref_primary_10_1016_j_eswa_2024_125530 crossref_primary_10_1007_s00066_024_02272_0 crossref_primary_10_1177_14727978251380837 crossref_primary_10_3389_fbinf_2025_1633623 crossref_primary_10_1016_j_amc_2025_129685 crossref_primary_10_1038_s41598_025_89285_6 crossref_primary_10_1142_S0218126625503670 crossref_primary_10_3390_app14188383 crossref_primary_10_1039_D5DD00221D crossref_primary_10_1109_ACCESS_2024_3355785 crossref_primary_10_1016_j_oceaneng_2023_116142 crossref_primary_10_12677_airr_2025_141014 crossref_primary_10_1016_j_cogsys_2025_101354 crossref_primary_10_3390_math13121999 crossref_primary_10_1007_s40747_023_01216_y crossref_primary_10_1080_0305215X_2024_2434201 crossref_primary_10_3390_make6040135 crossref_primary_10_1016_j_eswa_2025_127457 crossref_primary_10_1051_itmconf_20257801007 crossref_primary_10_1109_TIM_2024_3450088 crossref_primary_10_1016_j_enconman_2023_117921 crossref_primary_10_3390_app15116215 crossref_primary_10_1177_03019233251359803 crossref_primary_10_1109_JIOT_2024_3498322 crossref_primary_10_1016_j_scs_2023_105065 crossref_primary_10_1109_TPDS_2025_3550531 crossref_primary_10_1016_j_eswa_2024_124580 crossref_primary_10_1063_5_0220766 crossref_primary_10_3390_jrfm18090497 crossref_primary_10_1049_cth2_12775 crossref_primary_10_3390_biomimetics10050341 crossref_primary_10_1016_j_neubiorev_2024_105877 crossref_primary_10_1038_s41534_025_01065_2 crossref_primary_10_1016_j_ijrefrig_2024_03_009 crossref_primary_10_1016_j_neucom_2024_129328 crossref_primary_10_1016_j_tifs_2025_105055 crossref_primary_10_1016_j_eswa_2024_126196 crossref_primary_10_1016_j_optlaseng_2024_108534 crossref_primary_10_3390_wevj15060246 crossref_primary_10_1016_j_est_2025_115496 crossref_primary_10_1016_j_comnet_2025_111270 crossref_primary_10_1007_s10922_025_09927_y crossref_primary_10_1007_s10489_024_05933_w crossref_primary_10_1016_j_engappai_2024_109858 crossref_primary_10_3390_en18174779 crossref_primary_10_1016_j_simpat_2025_103118 crossref_primary_10_3390_wevj15020039 crossref_primary_10_1121_10_0037186 crossref_primary_10_3390_electronics13183590 crossref_primary_10_3390_math13193055 crossref_primary_10_3390_electronics13132488 crossref_primary_10_1016_j_ymssp_2024_111473 crossref_primary_10_1016_j_autcon_2025_106129 crossref_primary_10_1016_j_chemosphere_2024_142223 crossref_primary_10_1140_epje_s10189_025_00513_3 crossref_primary_10_1016_j_icte_2024_05_001 crossref_primary_10_1007_s10846_023_02030_x crossref_primary_10_3390_en18174783 crossref_primary_10_1016_j_ecmx_2025_101056 crossref_primary_10_1016_j_swevo_2025_101944 crossref_primary_10_1016_j_ymssp_2025_113029 crossref_primary_10_1109_ACCESS_2025_3559428 crossref_primary_10_1016_j_apenergy_2024_123923 crossref_primary_10_1109_TNNLS_2024_3440498 |
| Cites_doi | 10.1109/JRPROC.1961.287775 10.1016/j.ins.2013.08.037 10.1109/72.935097 10.1109/ACCESS.2020.3027152 10.3390/app9214701 10.1016/j.eswa.2021.116285 10.1023/A:1022140919877 10.1109/IJCNN.2007.4371212 10.1038/s41586-022-05172-4 10.1109/IROS51168.2021.9636193 10.1016/S0004-3702(01)00129-1 10.1016/j.engappai.2017.07.005 10.1109/TSP.2013.2241057 10.1016/j.comnet.2019.05.013 10.1177/0278364913495721 10.1007/978-3-031-22953-4_4 10.1155/2021/5300189 10.1109/9.580874 10.1016/j.neucom.2018.11.072 10.1023/A:1022672621406 10.1016/j.engappai.2022.105116 10.1016/j.jmsy.2022.05.018 10.1145/1390156.1390240 10.1109/TIE.2021.3104596 10.1016/j.ins.2022.04.053 10.1109/ACCESS.2020.3045027 10.1007/978-3-319-71682-4_5 10.1023/A:1018056104778 10.1109/ACCESS.2021.3074221 10.1038/nature24270 10.1109/TITS.2020.3033577 10.1109/VTC2021-Spring51267.2021.9448710 10.1016/j.eswa.2021.115127 10.1037/0033-295X.88.2.135 10.5220/0009821603140323 10.1016/j.eswa.2021.114663 10.1609/aaai.v34i04.6049 10.1613/jair.639 10.1016/j.jobe.2022.104165 10.26599/TST.2021.9010012 10.1109/JIOT.2021.3062091 10.1109/JIOT.2020.3015204 10.1609/aaai.v32i1.11794 10.1177/0278364918784350 10.1016/j.ins.2022.11.073 10.1016/j.eswa.2016.06.021 10.1016/j.engappai.2019.103360 10.1109/TMECH.2021.3072675 10.1613/jair.859 10.1613/jair.3912 10.1109/TASLP.2019.2919872 10.1109/TCYB.2020.2977374 10.1145/3453160 10.1109/JIOT.2020.3046622 10.1109/LRA.2021.3071954 10.1016/j.comnet.2020.107575 10.1155/2022/7839840 10.1109/ACCESS.2020.3023394 10.1609/aaai.v24i1.7727 10.1023/A:1018012322525 10.1109/IROS.2007.4399095 10.1109/JAS.2018.7511186 10.1038/nature14236 10.1016/j.conengprac.2020.104630 10.1016/j.engappai.2018.11.006 10.1038/s41586-020-03051-4 10.1016/j.neucom.2020.01.043 10.1145/3543846 10.1561/9781638280576 10.1016/j.engappai.2022.104848 10.1109/TAC.1970.1099405 10.1109/LRA.2022.3176112 10.1007/s10846-017-0468-y 10.1007/s11042-022-12572-1 10.1007/s10462-021-10061-9 10.1109/TSG.2016.2629450 10.1109/TVT.2022.3145346 10.1109/TNNLS.2021.3110281 10.1016/S0019-9958(77)90354-0 10.1109/TMECH.2021.3077388 10.1016/j.neunet.2008.02.003 10.1109/TITS.2020.3024655 10.1016/j.arcontrol.2020.06.001 10.1016/j.compchemeng.2019.05.029 10.1145/203330.203343 10.1109/TSUSC.2017.2743704 10.3390/s22062099 10.1109/JLT.2021.3125974 10.1016/j.cirpj.2022.11.003 10.1109/TSG.2022.3154718 10.1016/j.ins.2020.05.066 10.1109/ACCESS.2020.3038735 10.1023/A:1022628806385 10.1609/aaai.v26i1.8321 10.1109/ICRA.2018.8463189 10.1609/aaai.v30i1.10295 10.1016/j.engappai.2020.103869 10.1109/JAS.2020.1003072 10.1038/s41586-021-03819-2 10.1109/TITS.2021.3054625 10.1109/COMST.2019.2916583 10.1145/1772690.1772758 10.1109/THMS.2019.2912447 10.1162/neco.1997.9.8.1735 10.1016/j.automatica.2009.07.008 10.1126/science.abq1158 10.1109/TNN.2008.2005605 10.1038/nature14539 10.1023/A:1006559212014 10.1023/A:1022632907294 10.1016/j.compchemeng.2020.106886 10.1088/1367-2630/ab783d 10.1109/TCIAIG.2013.2294713 10.1145/1143844.1143955 10.1080/14786445008521796 10.1016/j.eswa.2022.116830 10.1016/j.eswa.2012.09.010 10.1038/nature16961 10.1145/1045236.1045237 10.1016/j.neunet.2021.10.003 10.1109/TAC.1965.1098193 10.1016/S0004-3702(99)00052-1 10.1147/rd.33.0210 10.1007/11564096_32 10.1038/nature25978 10.1016/j.engappai.2021.104451 10.1109/TIP.2022.3143699 10.1007/s10462-021-09997-9 10.1016/j.arcontrol.2018.09.005 10.1016/j.engappai.2021.104366 10.1049/el.2019.0244 10.1038/s41586-019-1724-z 10.1109/TII.2021.3093905 10.1145/1102351.1102377 10.1109/MSP.2017.2743240 10.1016/j.ins.2022.04.017 10.1109/ACCESS.2017.2777827 10.1016/j.robot.2008.10.024 10.1007/978-3-319-24574-4_28 10.1109/LRA.2020.2966414 10.1016/j.neucom.2007.11.026 10.1007/s10994-010-5229-0 10.1109/JPROC.2011.2109671 10.1016/S0019-9958(58)80003-0 10.1109/78.650093 10.3390/s21041067 10.1007/978-3-642-33492-4_6 10.1023/A:1017928328829 10.1016/j.eswa.2022.118926 10.1023/A:1022635613229 10.1007/11552246_35 10.1007/s10462-021-09996-w 10.7551/mitpress/9816.003.0050 10.1109/ACCESS.2021.3080617 10.1162/neco.2006.18.12.2936 10.1109/CDC.1998.760738 10.1109/ACCESS.2020.3034141 10.1109/TITS.2020.3003163 10.1021/acs.jcim.9b00325 10.1613/jair.806 10.1016/j.cirp.2021.04.056 10.1137/S0363012901385691 10.1109/TNNLS.2018.2790388 10.1109/ITSC.2011.6083114 10.1109/ROBOT.2004.1307456 10.1109/TSMCC.2007.913919 10.1109/LCOMM.2020.3025298 10.1109/CEC45853.2021.9504972 10.1126/science.aar6404 10.1007/s005210050038 |
| ContentType | Journal Article |
| Copyright | 2023 Elsevier Ltd |
| Copyright_xml | – notice: 2023 Elsevier Ltd |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.eswa.2023.120495 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1873-6793 |
| ExternalDocumentID | 10_1016_j_eswa_2023_120495 S0957417423009971 |
| GroupedDBID | --K --M .DC .~1 0R~ 13V 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXUO AAYFN ABBOA ABFNM ABMAC ABMVD ABUCO ABYKQ ACDAQ ACGFS ACHRH ACNTT ACRLP ACZNC ADBBV ADEZE ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGJBL AGUBO AGUMN AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM AXJTR BJAXD BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX IHE J1W JJJVA KOM LG9 LY1 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 ROL RPZ SDF SDG SDP SDS SES SEW SPC SPCBC SSB SSD SSL SST SSV SSZ T5K TN5 ~G- 29G 9DU AAAKG AAQXK AATTM AAXKI AAYWO AAYXX ABJNI ABKBG ABUFD ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB G-2 HLZ HVGLF HZ~ R2- SBC SET WUQ XPP ZMT ~HD |
| ID | FETCH-LOGICAL-c300t-e9d878f5a3329d1665004e1e10982c9933c3d19fbced7374f01255840945e6f23 |
| ISICitedReferencesCount | 271 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001046203600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0957-4174 |
| IngestDate | Sat Nov 29 07:02:47 EST 2025 Tue Nov 18 21:00:47 EST 2025 Fri Feb 23 02:35:09 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Deep Reinforcement Learning (DRL) Function approximation Stochastic optimal control Reinforcement learning |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c300t-e9d878f5a3329d1665004e1e10982c9933c3d19fbced7374f01255840945e6f23 |
| ORCID | 0000-0001-7213-6693 |
| ParticipantIDs | crossref_primary_10_1016_j_eswa_2023_120495 crossref_citationtrail_10_1016_j_eswa_2023_120495 elsevier_sciencedirect_doi_10_1016_j_eswa_2023_120495 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-11-30 |
| PublicationDateYYYYMMDD | 2023-11-30 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-11-30 day: 30 |
| PublicationDecade | 2020 |
| PublicationTitle | Expert systems with applications |
| PublicationYear | 2023 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Sutton, McAllester, Singh, Mansour (b1550) 2000; 12 2021. Stockfish: Strong open source chess engine. (2022). Retrieved from https://stockfishchess.org/. Accessed March 10, 2023. Kalyanakrishnan, Stone (b0685) 2007 Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In (pp. 501–510). Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G., Reichert, D., Rabinowitz, N., Barreto, A., & Degris, T. (2017b). The predictron: End-to-end learning and planning. In Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2020). Multi-agent actor-critic for mixed cooperative-competitive environments. In Mingshuo, N., Dongming, C., & Dongqi, W. (2022). Reinforcement learning on graph: A survey. arXiv preprint arXiv:2204.06127v3. Bakhtin, A., Wu, D. J., Lerer, A., Gray, J., Jacob, A. P., Farina, G., Miller, A. H., & Brown, N. (2022). Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning. arXiv preprint arXiv:2210.05492v1. Kirsch, L., Steenkiste, S. Van, & Schmidhuber, J. (2020). Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098. Radford, Wu, Child, Luan, Amodei, Sutskever (b1235) 2019 Moody, Saffell (b1050) 2001; 12 Verma, Murali, Singh, Kohli, Chandhuri (b1620) 2018 Gharagozlou, Mohammadzadeh, Bastanfard, Ghidary (b0465) 2022 (pp. 222–229). Luo, J., Li, C., Fan, Q., & Liu, Y. (2022b). A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347. Maes, F., Fonteneau, R., Wehenkel, L., & Ernst, D. (2012). Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, JG., Lenca, P., Petit, JM. (eds) Fujimoto, S., Meger, D., & Precup, D. (2019). Off-policy deep reinforcement learning without exploration. In , . Sun, Lan, Li, Guo, Hu, Hu (b1505) 2020; 183 Abdoos, M., Mozayani, N., & Bazzan, A. L. C. (2011). Traffic light control in non-stationary environments based on multi agent Q-learning. In Peters, J., & Schaal, S. (2007). Applying the episodic natural actor-critic architecture to motor primitive learning. In Mazyavkina, Sviridov, Ivanov, Burnaev (b0990) 2021; 134 Yu, Zhang, Jiang, Yang, Shang (b1765) 2021; 173 Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P. P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H, P. d. O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., & Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680. Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. arXiv preprint arXiv: 1708.02596v2. Gao, Y., Xu, H., Lin, Ji., Yu, F., Levine, S., & Darrell, T. (2018). Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv: 1802.05313. Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, D., & Blundell, C. (2020a). Agent57: Outperforming the atari human benchmark. arXiv preprint arXiv: 2003.13350v1. Schulman (b1355) 2016 Lecun, Bengio, Hinton (b0825) 2015; 521 Nguyen, Nguyen, Nahavandi (b1110) 2017; 5 Johnson, Hofmann, Hutton, Bignell (b0665) 2016 Silver, Hubert, Schrittwieser, Antonoglou, Lai, Guez, Hassabis (b1410) 2018; 362 Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-critic algorithms. In Sutton, Singh (b1555) 1999; 112 (pp. 2672–2680). Liu, Tian, Ai, Wang (b0925) 2020; 7 Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Polosukhin (b1615) 2017 Chow, Ghavamzadeh, Janson, Pavone (b0270) 2017; 18 Hua, Li, Zhao, Zhang, Chen (b0615) 2020 Maei, H. R., Szepesvari, C., Bhatnagar, S., Precup, D., Silver, D., & Sutton, R. S. (2009). Convergent temporal-difference learning with arbitrary smooth function approximation. In Lin (b0895) 1992; 8 Rashid, T., Samvelyan, M., Witt, C. S. de, Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Ståhl, Falkman, Karlsson, Mathiason, Boström (b1475) 2019; 59 Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games. arXiv preprint arXiv: 1703.10069. Konda, Tsitsiklis (b0770) 2003; 42 Singh, Man, Kearns, Walker (b1450) 2002; 16 (pp. 443–451). (pp. 1889–1897). Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Salakhutdinov, R., & Hinton, G. (2009). Deep Boltzmann Machines. In Schmitt, S., Hessel, M., & Simonyan, K. (2019). Off-policy actor-critic with shared experience replay. arXiv preprint arXiv:1909.11583. (pp. 202–211). Anderson, R. N., Boulanger, A., Powell, W. B., & Scott, W. (2011). Adaptive stochastic control for the smart grid. In S., & McFall, J. (2013). Concurrent reinforcement learning from customer interactions. In (pp. 1048–1056). Moerland, T. M., Broekens, J., Plaat, A., & Jonker., C. M. (2022). Model-based reinforcement learning: A Survey. arXiv preprint arXiv: 2006.16712v4. (pp. 2587–2601). (ICRA 2004) (pp. 2619–2624). Sun, P., Zhou, W., & Li, H. (2020b). Attentive experience replay. In (pp. 10199–10210). (pp. 2961–2970). Kidambi, R., Rajeswaran, A., Netrapalli, P., & Joachims, T. (2020). MOReL: Model-based offline reinforcement learning. In Wang, Wang, Sun (b1675) 2022; 602 (pp. 1008–1014). (pp. 401–408). Chen, Chen, Tan, Long, Gasic, Yu (b0250) 2019; 27 (pp. 1–13). Matignon, L., Laurent, G. J., & Fort-piat, N. Le. (2007). Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In L., Barker (pp. 263-272). Thanh, An, Chien (b1585) 2008 Kyaw, Paing, Thu, Mohan, Le, Veerajagadheswar (b0795) 2020; 8 Aradi (b0060) 2022; 23 Jaques, N., Gu, S., Bahdanau, D., Herńandez-Lobato, J. M., Turner, R. E., and Eck, D. (2017). Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. In Ding, Xu, Gao, Shen (b0330) 2022; 9 Glanois, C., Weng, P., Zimmer, M., Li, D., Yang, T., Hao, J., & Liu, W. (2022). A survey on interpretable reinforcement learning. arXiv preprint arXiv: 2112.13112v2. Yin, Chen, Liu, Huang, Gao (b1750) 2021; 106 (pp. 4344–4353). Bi, Jiang, Gao, Wendler, Karlas, Navab (b0195) 2022; 7 (pp. 6118-6128). (pp. 2252–2260). Argall, Chernova, Veloso, Browning (b0065) 2009; 57 Lee, A. X., Nagabandi, A., Abbeel, P., & Levine, S. (2020). Stochastic latent actor-critic : Deep reinforcement learning with a latent variable model. In Feinberg, V., Wan, A., Stoica, I., Jordan, M. I., Gonzalez, J. E., & Levine, S. (2018). Model-based value expansion for efficient model-free reinforcement learning. arXiv preprint arXiv: 1803.00101v1. (pp. 4754-4765). Khan, Gazara, Nofal, Chakrabarty, Dannoun, AL-Hmouz, Mursaleen (b0705) 2021; 9 Nadjahi, K., Laroche, R., & Combes, R. T. (2019). Safe policy improvement with soft baseline bootstrapping. arXiv preprint arXiv: 1907.05079v1. Szita, Lorincz (b1565) 2006; 18 Gronauer, Diepold (b0480) 2022; 55 Schrittwieser, Antonoglou, Hubert, Simonyan, Sifre, Schmitt, Silver (b1350) 2020; 588 Na, Niu, Lennox, Arvin (b1065) 2022; 71 Fox, R., Pakman, A., & Tishby, N. (2016). Taming the noise in reinforcement learning via soft updates. In (pp. 1–12). Scheikl, P. M., Gyenes, B., Davitashvili, T., Younis, R., Schulze, A., Muller-Stich, B. P., Neumann. G., & Mathis-Ullrich, F. (2021). Cooperative assistance in robotic surgery through multi-agent reinforcement learning. In (pp. 4295–4304). Campbell, Hoane, Hsu (b0230) 2002; 134 Noaeen, Naik, Goodman, Crebo, Abrar, Abad, Far (b1125) 2022; 199 Silver, D., Newnham Van Seijen, H., & Sutton, R. S. (2014). True online TD(λ). In Radoglou-Grammatikis, Rompolos, Sarigiannidis, Argyriou, Lagkas, Sarigiannidis, Wan (b1240) 2022; 18 Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In Zhou, Le, Luu, Nguyen, Ayache (b1815) 2021; 73 Hausknecht, Lehman, Miikkulainen, Stone (b0555) 2014; 6 Engel, Y., Mannor, S., & Ron, M. (2005). Reinforcement learning with Gaussian processes. In (pp. 201–208). (pp. 881–888). Bellemare, M. G., Veness, J., & Bowling, M. (2012). Investigating Contingency Awareness Using Atari 2600 Games. In Ernst, Geurts, Wehenkel (b0360) 2005; 6 (pp. 1146-1155). (pp. 314–323). Deisenroth, M. P., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Silver, Huang, Maddison, Guez, Sifre, van den Driessche, Hassabis (b1405) 2016; 529 (pp. 1098-1115). (pp. 1691-1696). Jaques, Lazaridou, Hughes, Gulcehre, Ortega, Strouse, Freitas (b0655) 2019 Li, Gomez, Nakamura, He (b0865) 2019; 49 Brockman, Cheung, Pettersson, Schneider, Schulman, Tang, Zaremba (b0210) 2016 Klopf (b0740) 1975 Peters, Schaal (b1205) 2008; 21 Willia (b1705) 1992; 8 Farahmand, A. M., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. (2008). Regularized policy iteration. In Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot. M., Sonnerat. N., Leibo. J. Z., Tuyls. K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning. In (pp. 441–448). Nguyen, Nguyen, Nahavandi (b1115) 2020; 50 Buşoniu, de Bruin, Tolić, Kober, Palunko (b0220) 2018; 46 Jaksch, Ortner, Auer (b0645) 2010; 11 Ding, Lin, Shi, Yan (b0325) 2022 Iqbal, S., & Sha, F. (2019). Actor-attention-critic for multi-agent reinforcement learning. In Chaffre, T., Moras, J., Chan-Hon-T 10.1016/j.eswa.2023.120495_b1365 10.1016/j.eswa.2023.120495_b0275 Li (10.1016/j.eswa.2023.120495_b0880) 2022; 40 10.1016/j.eswa.2023.120495_b1485 10.1016/j.eswa.2023.120495_b0395 10.1016/j.eswa.2023.120495_b0030 Liu (10.1016/j.eswa.2023.120495_b0905) 2023; 213 Afsar (10.1016/j.eswa.2023.120495_b0020) 2022; 55 10.1016/j.eswa.2023.120495_b0150 10.1016/j.eswa.2023.120495_b1360 10.1016/j.eswa.2023.120495_b1480 Subramanian (10.1016/j.eswa.2023.120495_b1490) 2022; 145 Zhang (10.1016/j.eswa.2023.120495_b1785) 2019; 4 Arwa (10.1016/j.eswa.2023.120495_b0075) 2020; 8 Bellman (10.1016/j.eswa.2023.120495_b0170) 1972 Bellman (10.1016/j.eswa.2023.120495_b0165) 1958; 1 Jumper (10.1016/j.eswa.2023.120495_b0670) 2021; 596 Vinyals (10.1016/j.eswa.2023.120495_b1625) 2019; 575 Baxter (10.1016/j.eswa.2023.120495_b0130) 2001; 15 Zhu (10.1016/j.eswa.2023.120495_b1825) 2021; 26 He (10.1016/j.eswa.2023.120495_b0565) 2016 Liu (10.1016/j.eswa.2023.120495_b0925) 2020; 7 10.1016/j.eswa.2023.120495_b0945 Zhang (10.1016/j.eswa.2023.120495_b1780) 2022; 191 10.1016/j.eswa.2023.120495_b0940 Zhou (10.1016/j.eswa.2023.120495_b1815) 2021; 73 10.1016/j.eswa.2023.120495_b0025 Michie (10.1016/j.eswa.2023.120495_b1010) 1968; 2 Radford (10.1016/j.eswa.2023.120495_b1235) 2019 10.1016/j.eswa.2023.120495_b0385 10.1016/j.eswa.2023.120495_b0140 Konda (10.1016/j.eswa.2023.120495_b0770) 2003; 42 10.1016/j.eswa.2023.120495_b0260 10.1016/j.eswa.2023.120495_b1470 Sutton (10.1016/j.eswa.2023.120495_b1525) 1990 Huang (10.1016/j.eswa.2023.120495_b0620) 2022; 64 Kar (10.1016/j.eswa.2023.120495_b0695) 2013; 61 Khan (10.1016/j.eswa.2023.120495_b0705) 2021; 9 Schuster (10.1016/j.eswa.2023.120495_b1375) 1997; 45 Haykin (10.1016/j.eswa.2023.120495_b0560) 2008 10.1016/j.eswa.2023.120495_b0810 10.1016/j.eswa.2023.120495_b0930 10.1016/j.eswa.2023.120495_b1225 10.1016/j.eswa.2023.120495_b0135 Ormoneit (10.1016/j.eswa.2023.120495_b1145) 2002; 49 10.1016/j.eswa.2023.120495_b1345 Zeng (10.1016/j.eswa.2023.120495_b1775) 2022; 468 Zhou (10.1016/j.eswa.2023.120495_b1810) 2019; 331 10.1016/j.eswa.2023.120495_b0495 Zhang (10.1016/j.eswa.2023.120495_b1795) 2018 10.1016/j.eswa.2023.120495_b0010 10.1016/j.eswa.2023.120495_b1340 Bertsekas (10.1016/j.eswa.2023.120495_b0180) 2005; vol. 1 10.1016/j.eswa.2023.120495_b0370 10.1016/j.eswa.2023.120495_b0490 Khayyat (10.1016/j.eswa.2023.120495_b0715) 2022; 81 Hu (10.1016/j.eswa.2023.120495_b0610) 2021; 9 Krishnan (10.1016/j.eswa.2023.120495_b0775) 2019; 38 Parisotto (10.1016/j.eswa.2023.120495_b1170) 2016 Moody (10.1016/j.eswa.2023.120495_b1050) 2001; 12 García (10.1016/j.eswa.2023.120495_b0460) 2020; 88 10.1016/j.eswa.2023.120495_b0920 Zhao (10.1016/j.eswa.2023.120495_b1805) 2020 10.1016/j.eswa.2023.120495_b0005 10.1016/j.eswa.2023.120495_b1335 10.1016/j.eswa.2023.120495_b0245 10.1016/j.eswa.2023.120495_b0365 10.1016/j.eswa.2023.120495_b1575 10.1016/j.eswa.2023.120495_b0485 HasanzadeZonuzy (10.1016/j.eswa.2023.120495_b0540) 2021 Peters (10.1016/j.eswa.2023.120495_b1205) 2008; 21 10.1016/j.eswa.2023.120495_b1695 10.1016/j.eswa.2023.120495_b0120 10.1016/j.eswa.2023.120495_b1330 10.1016/j.eswa.2023.120495_b0240 Fang (10.1016/j.eswa.2023.120495_b0380) 2021; 8 Xu (10.1016/j.eswa.2023.120495_b1740) 2014; 261 10.1016/j.eswa.2023.120495_b1690 Chow (10.1016/j.eswa.2023.120495_b0270) 2017; 18 Klopf (10.1016/j.eswa.2023.120495_b0745) 1982 Sutton (10.1016/j.eswa.2023.120495_b1540) 1998 Mendonca (10.1016/j.eswa.2023.120495_b1005) 2019 Mahmud (10.1016/j.eswa.2023.120495_b0965) 2018; 29 Segler (10.1016/j.eswa.2023.120495_b1380) 2018; 555 Bi (10.1016/j.eswa.2023.120495_b0195) 2022; 7 Apuroop (10.1016/j.eswa.2023.120495_b0055) 2021; 21 10.1016/j.eswa.2023.120495_b0910 Sutton (10.1016/j.eswa.2023.120495_b1530) 1981; 4 Tesauro (10.1016/j.eswa.2023.120495_b1580) 1995; 38 10.1016/j.eswa.2023.120495_b1325 10.1016/j.eswa.2023.120495_b0355 Wang (10.1016/j.eswa.2023.120495_b1660) 2023; 619 Schulman (10.1016/j.eswa.2023.120495_b1355) 2016 10.1016/j.eswa.2023.120495_b0870 10.1016/j.eswa.2023.120495_b1045 Polydoros (10.1016/j.eswa.2023.120495_b1210) 2017; 86 10.1016/j.eswa.2023.120495_b1285 Bhatnagar (10.1016/j.eswa.2023.120495_b0190) 2009; 45 Watter (10.1016/j.eswa.2023.120495_b1685) 2015 Bellman (10.1016/j.eswa.2023.120495_b0160) 1957; 6 Du (10.1016/j.eswa.2023.120495_b0335) 2020; 54 Mendel (10.1016/j.eswa.2023.120495_b1000) 1966; 5 10.1016/j.eswa.2023.120495_b1280 Morais (10.1016/j.eswa.2023.120495_b0305) 2020; 104 Soleymani (10.1016/j.eswa.2023.120495_b1460) 2021; 182 Van Seijen (10.1016/j.eswa.2023.120495_b1610) 2009 Gharagozlou (10.1016/j.eswa.2023.120495_b0465) 2022 Campbell (10.1016/j.eswa.2023.120495_b0230) 2002; 134 Mnih (10.1016/j.eswa.2023.120495_b1040) 2015; 518 Kobayashi (10.1016/j.eswa.2023.120495_b0750) 2020; 95 Wu (10.1016/j.eswa.2023.120495_b1720) 2017 Argall (10.1016/j.eswa.2023.120495_b0065) 2009; 57 10.1016/j.eswa.2023.120495_b0500 10.1016/j.eswa.2023.120495_b0860 Yu (10.1016/j.eswa.2023.120495_b1755) 2018 10.1016/j.eswa.2023.120495_b0980 10.1016/j.eswa.2023.120495_b1035 10.1016/j.eswa.2023.120495_b1155 Fu (10.1016/j.eswa.2023.120495_b0435) 2022; 50 10.1016/j.eswa.2023.120495_b1030 10.1016/j.eswa.2023.120495_b1150 10.1016/j.eswa.2023.120495_b1270 10.1016/j.eswa.2023.120495_b1390 Silver (10.1016/j.eswa.2023.120495_b1425) 2017; 550 Brockman (10.1016/j.eswa.2023.120495_b0210) 2016 Verma (10.1016/j.eswa.2023.120495_b1620) 2018 Wang (10.1016/j.eswa.2023.120495_b1670) 2016 Pateria (10.1016/j.eswa.2023.120495_b1175) 2021; 54 Rajak (10.1016/j.eswa.2023.120495_b1245) 2021; 7 Haarnoja (10.1016/j.eswa.2023.120495_b0505) 2018 Vaswani (10.1016/j.eswa.2023.120495_b1615) 2017 Fan (10.1016/j.eswa.2023.120495_b0375) 2022; 22 10.1016/j.eswa.2023.120495_b0735 Hausknecht (10.1016/j.eswa.2023.120495_b0555) 2014; 6 Werbos (10.1016/j.eswa.2023.120495_b1700) 1977; 1977 10.1016/j.eswa.2023.120495_b0975 Johnson (10.1016/j.eswa.2023.120495_b0665) 2016 10.1016/j.eswa.2023.120495_b0730 10.1016/j.eswa.2023.120495_b0850 Li (10.1016/j.eswa.2023.120495_b0855) 2022; 378 Omidshafiei (10.1016/j.eswa.2023.120495_b1140) 2017 10.1016/j.eswa.2023.120495_b0970 10.1016/j.eswa.2023.120495_b1265 Vo (10.1016/j.eswa.2023.120495_b1635) 2022; 26 10.1016/j.eswa.2023.120495_b0175 Buşoniu (10.1016/j.eswa.2023.120495_b0220) 2018; 46 Yin (10.1016/j.eswa.2023.120495_b1750) 2021; 106 Miljković (10.1016/j.eswa.2023.120495_b1015) 2013; 40 10.1016/j.eswa.2023.120495_b1020 Oh (10.1016/j.eswa.2023.120495_b1130) 2015 Zhang (10.1016/j.eswa.2023.120495_b1790) 2021; 8 10.1016/j.eswa.2023.120495_b0050 10.1016/j.eswa.2023.120495_b1260 Sutton (10.1016/j.eswa.2023.120495_b1555) 1999; 112 10.1016/j.eswa.2023.120495_b0290 Tsitsiklis (10.1016/j.eswa.2023.120495_b1595) 1997; 42 Claessens (10.1016/j.eswa.2023.120495_b0285) 2018; 9 Radoglou-Grammatikis (10.1016/j.eswa.2023.120495_b1240) 2022; 18 10.1016/j.eswa.2023.120495_b0845 Ding (10.1016/j.eswa.2023.120495_b0330) 2022; 9 10.1016/j.eswa.2023.120495_b0720 10.1016/j.eswa.2023.120495_b0840 10.1016/j.eswa.2023.120495_b0960 Ng (10.1016/j.eswa.2023.120495_b1105) 2000 10.1016/j.eswa.2023.120495_b1135 10.1016/j.eswa.2023.120495_b1495 10.1016/j.eswa.2023.120495_b1370 10.1016/j.eswa.2023.120495_b0280 Ernst (10.1016/j.eswa.2023.120495_b0360) 2005; 6 Harney (10.1016/j.eswa.2023.120495_b0535) 2020; 22 Lu (10.1016/j.eswa.2023.120495_b0935) 2022; 69 Thanh (10.1016/j.eswa.2023.120495_b1585) 2008 10.1016/j.eswa.2023.120495_b0835 10.1016/j.eswa.2023.120495_b0955 10.1016/j.eswa.2023.120495_b0830 Munos (10.1016/j.eswa.2023.120495_b1060) 2016 Pan (10.1016/j.eswa.2023.120495_b1160) 2018 Thorndike (10.1016/j.eswa.2023.120495_b1590) 1911 10.1016/j.eswa.2023.120495_b0310 10.1016/j.eswa.2023.120495_b1640 10.1016/j.eswa.2023.120495_b0550 Lagoudakis (10.1016/j.eswa.2023.120495_b0805) 2003; 4 10.1016/j.eswa.2023.120495_b1760 Singh (10.1016/j.eswa.2023.120495_b1455) 1996; 22 10.1016/j.eswa.2023.120495_b0790 Chen (10.1016/j.eswa.2023.120495_b0265) 2022; 13 Barto (10.1016/j.eswa.2023.120495_b0125) 2003; 13 10.1016/j.eswa.2023.120495_b1085 Hein (10.1016/j.eswa.2023.120495_b0570) 2017; 65 10.1016/j.eswa.2023.120495_b1080 Pomerleau (10.1016/j.eswa.2023.120495_b1215) 1989 Azar (10.1016/j.eswa.2023.120495_b0090) 2020; 50 Li (10.1016/j.eswa.2023.120495_b0875) 2023; 40 Gronauer (10.1016/j.eswa.2023.120495_b0480) 2022; 55 Wymann (10.1016/j.eswa.2023.120495_b1730) 2013; v1.3.5 10.1016/j.eswa.2023.120495_b1515 Wang (10.1016/j.eswa.2023.120495_b1675) 2022; 602 10.1016/j.eswa.2023.120495_b0425 Nguyen (10.1016/j.eswa.2023.120495_b1115) 2020; 50 10.1016/j.eswa.2023.120495_b0545 10.1016/j.eswa.2023.120495_b0785 10.1016/j.eswa.2023.120495_b1510 10.1016/j.eswa.2023.120495_b0420 Ioffe (10.1016/j.eswa.2023.120495_b0630) 2015 10.1016/j.eswa.2023.120495_b1630 Samsani (10.1016/j.eswa.2023.120495_b1310) 2021; 6 10.1016/j.eswa.2023.120495_b0660 Lazaric (10.1016/j.eswa.2023.120495_b0815) 2012; 13 10.1016/j.eswa.2023.120495_b0780 Rakelly (10.1016/j.eswa.2023.120495_b1250) 2019; 97 Henderson (10.1016/j.eswa.2023.120495_b0575) 2018 Jaksch (10.1016/j.eswa.2023.120495_b0645) 2010; 11 10.1016/j.eswa.2023.120495_b1195 Bellman (10.1016/j.eswa.2023.120495_b0155) 1956; 16 Sun (10.1016/j.eswa.2023.120495_b1500) 2021; 25 Kyaw (10.1016/j.eswa.2023.120495_b0795) 2020; 8 10.1016/j.eswa.2023.120495_b1070 Lecun (10.1016/j.eswa.2023.120495_b0825) 2015; 521 10.1016/j.eswa.2023.120495_b1190 Zhao (10.1016/j.eswa.2023.120495_b1800) 2021; 22 Amini (10.1016/j.eswa.2023.120495_b0040) 2020; 5 10.1016/j.eswa.2023.120495_b0415 Amini (10.1016/j.eswa.2023.120495_b0045) 2020; 6 Riedmiller (10.1016/j.eswa.2023.120495_b1275) 1999; 8 10.1016/j.eswa.2023.120495_b0410 10.1016/j.eswa.2023.120495_b0650 Fu (10.1016/j.eswa.2023.120495_b0430) 1970; 15 10.1016/j.eswa.2023.120495_b0890 10.1016/j.eswa.2023.120495_b1185 10.1016/j.eswa.2023.120495_b0095 Fawzi (10.1016/j.eswa.2023.120495_b0390) 2022; 610 Cao (10.1016/j.eswa.2023.120495_b0235) 2022; 27 Pong (10.1016/j.eswa.2023.120495_b1220) 2018 Dietterich (10.1016/j.eswa.2023.120495_b0320) 2000; 13 Silver (10.1016/j.eswa.2023.120495_b1405) 2016; 529 Hester (10.1016/j.eswa.2023.120495_b0585) 2018 Banerjee (10.1016/j.eswa.2023.120495_b0115) 2021; 67 10.1016/j.eswa.2023.120495_b0405 10.1016/j.eswa.2023.120495_b0525 10.1016/j.eswa.2023.120495_b0765 10.1016/j |
| References_xml | – reference: Kumar, A., Zhou, A., Tucker, G., & Levine, S. (2020). Conservative Q-learning for offline reinforcement learning. In – reference: Strehl, A. L., Lihong, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). PAC model-free reinforcement learning. In – reference: (pp. 1-9). – reference: Jaques, N., Gu, S., Bahdanau, D., Herńandez-Lobato, J. M., Turner, R. E., and Eck, D. (2017). Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. In – year: 2008 ident: b0560 article-title: Neural Networks and Learning Machines – reference: Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397. – volume: 97 start-page: 5331 year: 2019 end-page: 5340 ident: b1250 publication-title: Efficient off-policy meta-reinforcement learning via probabilistic context variables – reference: Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In – volume: vol. 1 year: 2005 ident: b0180 publication-title: Dynamic programming and optimal control – volume: 13 start-page: 41 year: 2003 end-page: 77 ident: b0125 article-title: Recent advances in hierarchical reinforcement learning publication-title: Discrete Event Dynamic Systems: Theory and Applications – reference: (pp. 361–368). – reference: Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Lefrancq, L., & Legg, S. (2017). AI safety gridworlds. arXiv preprint arXiv:1711.09883. – volume: 5 start-page: 27091 year: 2017 end-page: 27102 ident: b1110 article-title: System design perspective for human-level agents using deep reinforcement learning: A survey publication-title: IEEE Access – reference: Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In – volume: 182 start-page: 115127 year: 2021 ident: b1460 article-title: Deep graph convolutional reinforcement learning for financial portfolio management – DeepPocket publication-title: Expert Systems With Applications – reference: Oh, J., Singh, S., & Lee, H. (2017). Value prediction network – volume: 38 start-page: 156 year: 2008 end-page: 172 ident: b0215 article-title: A comprehensive survey of multiagent reinforcement learning publication-title: IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews – volume: 9 start-page: 67259 year: 2021 end-page: 67267 ident: b0610 article-title: Reward shaping based federated reinforcement learning publication-title: IEEE Access – reference: In – reference: (pp. 74–98). – reference: Ciosek, K., Vuong, Q., Loftin, R., & Hofmann, K. (2019). Better exploration with optimistic actor-critic. In – reference: Castro, P. S., Moitra, S., Gelada, C., Kumar, S., & Bellemare, M. G. (2018). Dopamine: A research framework for deep reinforcement learning. arXiv preprint arXiv: 1812.06110. – start-page: 7667 year: 2021 end-page: 7674 ident: b0540 article-title: Learning with safety constraints: Sample complexity of reinforcement learning for constrained MDPs publication-title: Proceedings of the 35th AAAI Conference on Artificial Intelligence – reference: Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A. D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K., & Silver, D. (2015). Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296v2. – reference: Levine, S., & Koltun, V. (2013). Guided policy search. In – reference: Iqbal, S., & Sha, F. (2019). Actor-attention-critic for multi-agent reinforcement learning. In – reference: Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In – start-page: 1 year: 2020 end-page: 6 ident: b1805 article-title: State representation learning for effective deep reinforcement learning publication-title: IEEE International Conference on Multimedia and Expo. (ICME) – volume: 6 start-page: 679 year: 1957 end-page: 684 ident: b0160 article-title: A Markovian decision process publication-title: Journal of Mathematics and Mechanics – start-page: 2746 year: 2015 end-page: 2754 ident: b1685 article-title: Embed to control: A locally linear latent dynamics model for control from raw images publication-title: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS) – reference: Bellemare, M. G., Veness, J., & Bowling, M. (2012). Investigating Contingency Awareness Using Atari 2600 Games. In – reference: (pp. 1071-1079). – reference: Liu, S., Ngiam, K. Y., & Feng, M. (2019). Deep reinforcement learning for clinical decision support: A brief survey. arXiv preprint arXiv: 1907.09475. – reference: Fakoor, R., Chaudhari, P., Soatto, S., & Smola, A. J. (2020). META-Q-Learning. arXiv preprint arXiv:1910.00125. – volume: 22 start-page: 4550 year: 2021 end-page: 4559 ident: b1075 article-title: A generative adversarial network enabled deep distributional reinforcement learning for transmission scheduling in internet of vehicles publication-title: IEEE Transactions on Intelligent Transportation Systems – reference: Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. In – volume: 22 start-page: 7208 year: 2021 end-page: 7218 ident: b1800 article-title: A hybrid of deep reinforcement learning and local search for the vehicle routing problems publication-title: IEEE Transactions on Intelligent Transportation Systems – year: 1927 ident: b1180 article-title: Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex – year: 2018 ident: b1545 article-title: Reinforcement Learning An Introduction – reference: Liu, J., & Feng, L. (2021). Diversity evolutionary policy deep reinforcement learning. – volume: 550 start-page: 354 year: 2017 end-page: 359 ident: b1425 article-title: Mastering the game of Go without human knowledge publication-title: Nature – reference: Rudin, N., Hoeller, D., Reist, P., & Hutter, M. (2021). Learning to walk in minutes using massively parallel deep reinforcement learning. arXiv preprint arXiv:2109.11978. – start-page: 5285 year: 2017 end-page: 5294 ident: b1720 article-title: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation publication-title: Proceddings of the 31st Conference on Neural Information Processing Systems – volume: 8 start-page: 341 year: 1992 end-page: 362 ident: b0300 article-title: The convergence of TD(λ) for general λ publication-title: Machine Learning – reference: Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Pfaff , T., Weber, T., Buesing, L., & Battaglia, P. W. (2020). Combining Q-learning and search with amortized value estimates. – volume: 27 start-page: 1378 year: 2019 end-page: 1391 ident: b0250 article-title: AgentGraph: Toward universal dialogue management with structured deep reinforcement learning publication-title: IEEE/ACM Transactions on Audio Speech and Language Processing – reference: (pp. 4344–4353). – volume: v1.3.5 start-page: 2013 year: 2013 ident: b1730 publication-title: TORCS, The open racing car simulator – volume: 86 start-page: 153 year: 2017 end-page: 173 ident: b1210 article-title: Survey of model-based reinforcement learning: Applications on Robotics publication-title: Journal of Intelligent and Robotic Systems: Theory and Applications – volume: 362 start-page: 1140 year: 2018 end-page: 1144 ident: b1410 article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play publication-title: Science – volume: 3 start-page: 72 year: 1978 end-page: 75 ident: b1520 article-title: Single channel theory: A neuronal theory of learning publication-title: Brain Theory Newsletter – volume: 22 start-page: 33 year: 1996 end-page: 57 ident: b0205 article-title: Linear least-squares algorithms for temporal difference learning publication-title: Machine Learning – reference: (pp. 2145–2153). – year: 2016 ident: b1355 article-title: Optimizing Expectations: From deep reinforcement learning to stochastic computation graphs – volume: 55 start-page: 895 year: 2022 end-page: 943 ident: b0480 article-title: Multi-agent deep reinforcement learning: A survey publication-title: Artificial Intelligence Review – reference: Rawlik, K., Toussaint, M., & Vijayakumar, S. (2012). On stochastic optimal control and reinforcement learning by approximate inference. In – reference: , 2021. – reference: (pp. 4754-4765). – volume: 88 start-page: 103360 year: 2020 ident: b0460 article-title: Teaching a humanoid robot to walk faster through safe reinforcement learning publication-title: Engineering Applications of Artificial Intelligence – reference: Sun, P., Zhou, W., & Li, H. (2020b). Attentive experience replay. In – start-page: 3040 year: 2019 end-page: 3049 ident: b0655 article-title: Social influence as intrinsic motivation for multi-agent deep reinforcement learning publication-title: Proceedings of the 36th International Conference on Machine Learning – volume: 9 start-page: 5785 year: 2022 end-page: 5798 ident: b0330 article-title: Trajectory design and access control for air – Ground coordinated communications system with multiagent deep reinforcement learning publication-title: IEEE Internet of Things Journal – volume: 41 start-page: 256 year: 1950 end-page: 275 ident: b1385 article-title: XXII. Programming a computer for playing chess publication-title: Philosophical Magazine and Journal of Science – volume: 5 start-page: 1143 year: 2020 end-page: 1150 ident: b0040 article-title: Learning robust control policies for end-to-end autonomous driving from data-driven simulation publication-title: IEEE Robotics and Automation Letters – reference: Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games. arXiv preprint arXiv: 1703.10069. – reference: Akkaya, I., Andrychowicz, M., Chociej, M., Litwin, M., McGrew, B., Petron, A., Paino, A., Plappert, M., Powell, G., Ribas, R., Schneider, J., Tezak, N., Tworek, J., Welinder, P., Weng, L., Yuan, Q., Zaremba, W., & Zhang, L. (2019). Solving Rubik’s cube with a robot hand. arXiv preprint arXiv: 1910.07113. – volume: 42 start-page: 1143 year: 2003 end-page: 1166 ident: b0770 article-title: On actor-critic algorithms publication-title: SIAM Journal on Control and Optimization – start-page: 630 year: 2016 end-page: 645 ident: b0565 article-title: Identity mappings in deep residual networks publication-title: Proceedings of the European Conference on Computer Vision – start-page: 3223 year: 2018 end-page: 3230 ident: b0585 article-title: Deep Q-learning from demonstrations publication-title: Proceedings of the 32nd AAAI Conference on Artificial Intelligence – reference: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347. – reference: (pp. 881–888). – volume: 42 start-page: 674 year: 1997 end-page: 690 ident: b1595 article-title: An analysis of temporal-difference learning with function approximation publication-title: IEEE Transactions on Automatic Control – reference: Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In – volume: 2 start-page: 137 year: 1968 end-page: 152 ident: b1010 article-title: BOXES, An experiment in adaptive control publication-title: Machine Intelligence – start-page: 1 year: 2017 end-page: 20 ident: b1650 article-title: Sample efficient actor-critic with experience replay publication-title: Proceedings of the 5th International Conference on Learning Representations (ICLR) – reference: Levine, S., & Abbeel, P. (2014). Learning neural network policies with guided policy search under unknown dynamics. In – reference: (pp. 1098-1115). – reference: (pp. 5023–5033). – reference: (2022)104848, 1–16. – reference: Shannon, C. E. (1952). “Theseus” maze-solving mouse. Retrieved from http://cyberneticzoo.com/mazesolvers/1952-–-theseus-maze-solving-mouse-–-claude-shannon-american/. Accessed March 10, 2023. – reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602. – start-page: 1 year: 2022 end-page: 12 ident: b0295 article-title: Distributed actor-critic algorithms for multiagent reinforcement learning over directed graphs publication-title: IEEE Transactions On Neural Networks and Learning Systems – start-page: 1 year: 2020 end-page: 6 ident: b0615 article-title: GAN-based deep distributional reinforcement learning for resource management in network slicing publication-title: Proceedings of the 2019 IEEE Global Communications Conference – reference: Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., & Wierstra, D., (2016). Continuous control with deep reinforcement learning. In – reference: (pp. 6383–6393). – reference: Zanette, A. & Brunskill, E. (2019). Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds. In – reference: Kaiser, Ł., Babaeizadeh, M., Miłos, P., Osinski, B., Campbell, R. H., Czechowski, K., Erhan. D., Finn. C., Kozakowski. P., Levine. S., Mohuuddin. A., Sepassi. R., Tucker. G., & Michalewski, H. (2020). Model based reinforcement learning for atari. arXiv preprint arXiv:1903.00374. – reference: Silver, D., Newnham, – volume: 183 start-page: 107575 year: 2020 ident: b1505 article-title: Efficient flow migration for NFV with Graph-aware deep reinforcement learning publication-title: Computer Networks – volume: 47 start-page: 253 year: 2013 end-page: 279 ident: b0145 article-title: The arcade learning environment: An evaluation platform for general agents publication-title: Journal of Artificial Intelligence Research – reference: Weber, T., Racanière, S., Reichert, D. P., Buesing, L., Guez,A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D., & Wierstra, D. (2017). Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv: 1707.06203v2. – reference: Deisenroth, M. P., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In – volume: 139 start-page: 1 year: 2020 end-page: 30 ident: b1120 article-title: A review On reinforcement learning: Introduction and applications in industrial process control publication-title: Computers and Chemical Engineering – reference: Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., & Kavukcuoglu, K. (2018). IMPALA: Scalable distributed Deep-RL with importance weighted actor-learner architectures. In – volume: 145 start-page: 271 year: 2022 end-page: 287 ident: b1490 article-title: Reinforcement learning and its connections with neuroscience and psychology publication-title: Neural Networks – start-page: 5872 year: 2018 end-page: 5881 ident: b1795 article-title: Fully decentralized multi-agent reinforcement learning with networked agents publication-title: Proceedings of the 35th International Conference on Machine Learning, PMLR 80 – volume: 13 start-page: 2935 year: 2022 end-page: 2958 ident: b0265 article-title: Reinforcement learning for selective key applications in power systems: Recent advances and future challenges publication-title: IEEE Transactions On Smart Grid – reference: Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Reinforcement learning with soft state aggregation. – start-page: 3215 year: 2018 end-page: 3222 ident: b0580 article-title: Rainbow: Combining improvements in deep reinforcement learning publication-title: 32nd AAAI Conference on Artificial Intelligence (AAAI) – year: 1998 ident: b1540 article-title: Introduction to Reinforcement Learning – volume: 173 start-page: 114663 year: 2021 ident: b1765 article-title: Reinforcement learning approach for resource allocation in humanitarian logistics publication-title: Expert Systems With Applications – reference: (pp. 2–7). – volume: 18 start-page: 6070 year: 2017 end-page: 6120 ident: b0270 article-title: Risk-constrained reinforcement learning with percentile risk criteria publication-title: The Journal of Machine Learning Research – volume: 21 start-page: 1 year: 2021 end-page: 20 ident: b0055 article-title: Reinforcement learning-based complete area coverage path planning for a modified htrihex robot publication-title: Sensors – reference: (pp. 4295–4304). – volume: 134 start-page: 57 year: 2002 end-page: 83 ident: b0230 article-title: Deep blue publication-title: Artificial Intelligence – start-page: 1 year: 2016 end-page: 16 ident: b1170 article-title: Actor-mimic deep multitask and transfer reinforcement learning publication-title: Proceedings of the 4th International Conference on Learning Representations (ICLR) – volume: 3 start-page: 210 year: 1959 end-page: 229 ident: b1315 article-title: Some studies in machine learning using the game of Chekers publication-title: IBM Journal of Research and Development – reference: Yu, T., Thomas, G., Yu, L., Ermon, S., Zou, J., Levine, S., Finn, C., & Ma, T. (2020). MOPO: Model-based offline policy optimization. In – reference: Li, Y. (2018). Deep reinforcement learning. arXiv preprint arXiv:1810.06339v1. – reference: (pp. 2094–2100). – volume: 20 start-page: 61 year: 2009 end-page: 80 ident: b1320 article-title: The graph neural network model publication-title: IEEE Transactions on Neural Networks – volume: 55 start-page: 945 year: 2022 end-page: 990 ident: b1445 article-title: Reinforcement learning in robotic applications: A comprehensive survey publication-title: Artificial Intelligence Review – reference: Fujimoto, S., Van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. In – reference: (pp.1097–1105). – reference: (pp. 66–83). – volume: 38 start-page: 58 year: 1995 end-page: 67 ident: b1580 article-title: Temporal difference learning and TD-Gammon publication-title: Communication of the ACM – reference: MathWorks, Block diagram of reinforcement learning. (2023). Retrieved from https://www.mathworks.com/help/reinforcement-learning/ug/create-simulink-environments-for-reinforcement-learning.html. Accessed March 10, 2023. – reference: Riedmiller, M. (2005). Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method. In – volume: 16 start-page: 221 year: 1956 end-page: 229 ident: b0155 article-title: A Problem in the sequential design of experiments publication-title: The Indian Journal of Statistics – reference: (pp. 64–69). – volume: 16 start-page: 105 year: 2002 end-page: 133 ident: b1450 article-title: Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system publication-title: Journal of Artificial Intelligence Research – volume: 529 start-page: 484 year: 2016 end-page: 489 ident: b1405 article-title: Mastering the game of Go with deep neural networks and tree search publication-title: Nature – volume: 15 start-page: 210 year: 1970 end-page: 221 ident: b0430 article-title: Learning control systems—Review and outlook publication-title: IEEE Transactions on Automatic Control – reference: 28(3) (pp. 924-932). – start-page: 26 year: 2017 end-page: 38 ident: b0070 article-title: Deep reinforcement learning: A brief survey publication-title: IEEE Signal Processing Magazine – reference: (pp. 6118-6128). – volume: 538 start-page: 142 year: 2020 end-page: 158 ident: b1715 article-title: Adaptive stock trading strategies with deep reinforcement learning methods publication-title: Information Sciences – volume: 40 start-page: 935 year: 2022 end-page: 946 ident: b0880 article-title: GNN-based hierarchical deep reinforcement learning for NFV-oriented online resource orchestration in elastic optical DCIs publication-title: Journal of Lightwave Technology – volume: 11 start-page: 1563 year: 2010 end-page: 1600 ident: b0645 article-title: Near-optimal regret bounds for reinforcement learning publication-title: Journal of Machine Learning Research – reference: L., Barker, – volume: 104 start-page: 104630 year: 2020 ident: b0305 article-title: Vision-based robust control framework based on deep reinforcement learning applied to autonomous ground vehicles publication-title: Control Engineering Practice – start-page: 216 year: 1990 end-page: 224 ident: b1525 article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming publication-title: Proceedings of the 7th International Conference Machine Learning Proceedings – volume: 9 start-page: 1735 year: 1997 end-page: 1780 ident: b0590 article-title: Long Short-Term Memory publication-title: Neural Computation – volume: 8 start-page: 208992 year: 2020 end-page: 209007 ident: b0075 article-title: Reinforcement learning techniques for optimal power control in grid-connected microgrids: A comprehensive review publication-title: IEEE Access – reference: (pp. 2613–2621). – volume: 34 start-page: 286 year: 1977 end-page: 295 ident: b1710 article-title: An adaptive optimal controller for discrete-time markov environments publication-title: Information and Control – reference: Schulman, J., Moritz, P., Levine, S., Jordan, M. I., & Abbeel, P. (2016). High-dimensional continuous control using generalized advantage estimation. In – volume: 112 start-page: 181 year: 1999 end-page: 211 ident: b1555 article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning publication-title: Artificial Intelligence – volume: 26 start-page: 674 year: 2021 end-page: 691 ident: b1825 article-title: Deep reinforcement learning based mobile robot navigation: A review publication-title: Tsinghua Science and Technology – volume: 6 start-page: 355 year: 2014 end-page: 366 ident: b0555 article-title: A neuroevolution approach to general atari game playing publication-title: IEEE Transactions on Computational Intelligence and AI in Games – volume: 88 start-page: 135 year: 1981 end-page: 170 ident: b1535 article-title: Toward a modern theory of adaptive networks: Expectation and prediction publication-title: Psychological Review – year: 1994 ident: b1230 article-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming – volume: 81 start-page: 15395 year: 2022 end-page: 15417 ident: b0715 article-title: Deep reinforcement learning approach for manuscripts image classification and retrieval publication-title: Multimedia Tools and Applications – volume: 40 start-page: 1721 year: 2013 end-page: 1736 ident: b1015 article-title: Neural network reinforcement learning for visual control of robot manipulator publication-title: Expert Systems With Applications – volume: 1 start-page: 228 year: 1958 end-page: 239 ident: b0165 article-title: Dynamic programming and stochastic control processes publication-title: Information and Control – reference: (pp. 1580-1585). – year: 2016 ident: b0210 publication-title: OpenAI Gym. – reference: (pp. 501–510). – volume: 18 start-page: 2041 year: 2022 end-page: 2052 ident: b1240 article-title: Modeling, detecting, and mitigating threats against industrial healthcare systems: A combined software defined networking and reinforcement learning approach publication-title: IEEE Transactions on Industrial Informatics – reference: Schmitt, S., Hessel, M., & Simonyan, K. (2019). Off-policy actor-critic with shared experience replay. arXiv preprint arXiv:1909.11583. – volume: 106 start-page: 104451 year: 2021 ident: b1750 article-title: Quantum deep reinforcement learning for rotor side converter control of double-fed induction generator-based wind turbines publication-title: Engineering Applications of Artificial Intelligence – reference: Schulman, J., Levine, S., Moritz, P., Jordan, M., & Abbeel, P. (2015). Trust region policy optimization. In – reference: Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In – volume: 4 start-page: 132 year: 2019 end-page: 141 ident: b1785 article-title: Energy-efficient scheduling for real-time systems based on deep Q-learning model publication-title: IEEE Transactions on Sustainable Computing – volume: 8 start-page: 171058 year: 2020 end-page: 171077 ident: b0035 article-title: Reinforcement learning interpretation methods: A survey publication-title: IEEE Access – reference: (pp. 317–328). – volume: 23 start-page: 4909 year: 2022 end-page: 4926 ident: b0725 article-title: Deep reinforcement learning for autonomous driving: A survey publication-title: IEEE Transactions On Intelligent Transportation Systems – volume: 575 start-page: 350 year: 2019 end-page: 354 ident: b1625 article-title: Grandmaster level in StarCraft II using multi-agent reinforcement learning publication-title: Nature – volume: 50 start-page: 1 year: 2022 end-page: 22 ident: b0435 article-title: Applications of reinforcement learning for building energy efficiency control: A review publication-title: Journal of Building Engineering – reference: (pp. 353–360). – reference: Luo, F., Xu, T., Lai, H., Chen, X., Zhang, W., & Yu, Y. (2022a). A survey on model-based reinforcement learning. arXiv preprint arXiv:2206.09328v1. – reference: Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In – reference: Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In – reference: Fujimoto, S., Meger, D., & Precup, D. (2019). Off-policy deep reinforcement learning without exploration. In – reference: (pp. 222–229). – reference: Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. arXiv preprint arXiv arXiv:1505.04597. – reference: Van Seijen, H., & Sutton, R. S. (2014). True online TD(λ). In – volume: 261 start-page: 1 year: 2014 end-page: 31 ident: b1740 article-title: Reinforcement learning algorithms with function approximation: Recent advances and applications publication-title: Information Sciences – reference: Kulkarni, T. D., Narasimhan, K. R., Saeedi, A., & Tenenbaum, J. B. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In – reference: Luo, J., Li, C., Fan, Q., & Liu, Y. (2022b). A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning. – start-page: 177 year: 2009 end-page: 184 ident: b1610 article-title: A theoretical and empirical analysis of expected sarsa publication-title: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (IEEE) – volume: 21 start-page: 682 year: 2008 end-page: 697 ident: b1205 article-title: Reinforcement learning of motor skills with policy gradients publication-title: Neural Networks – volume: 55 start-page: 1 year: 2022 end-page: 38 ident: b0020 article-title: Reinforcement learning based recommender systems: A survey publication-title: ACM Computing Surveys – reference: (pp. 2672–2680). – reference: (pp. 664–671). – reference: (pp. 1146-1155). – volume: 596 start-page: 583 year: 2021 end-page: 589 ident: b0670 article-title: Highly accurate protein structure prediction with AlphaFold publication-title: Nature – reference: Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In – reference: (pp. 10199–10210). – reference: (pp. 401–408). – volume: 54 start-page: 3215 year: 2020 end-page: 3238 ident: b0335 article-title: A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications publication-title: Artificial Intelligence Review – reference: Schaefer, A. M., Schneegass, D., Sterzing, V., & Udluft, S. (2007). A neural reinforcement learning approach to gas turbine control. In – year: 1972 ident: b0170 article-title: Dynamic programming – start-page: 1995 year: 2016 end-page: 2003 ident: b1670 article-title: Dueling network architectures for deep reinforcement learning publication-title: Proceedings of the 33rd International Conference on Machine Learning (ICML) – volume: 18 start-page: 2936 year: 2006 end-page: 2941 ident: b1565 article-title: Learning tetris using the noisy cross-entropy method publication-title: Neural Computation – reference: Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. (2016). Benchmarking deep reinforcement learning for continuous control. In – reference: (pp. 441–448). – reference: (ICRA 2004) (pp. 2619–2624). – reference: Foerster, J. N., Assael, Y. M., Freitas, N. de, & Whiteson, S. (2016b). Learning to communicate to solve riddles with deep distributed recurrent Q-networks. arXiv preprint arXiv:1602.02672. – reference: (pp. 1–12). – reference: Glanois, C., Weng, P., Zimmer, M., Li, D., Yang, T., Hao, J., & Liu, W. (2022). A survey on interpretable reinforcement learning. arXiv preprint arXiv: 2112.13112v2. – reference: Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2023). Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104v1. – reference: Wayne, G., Hung, C. C., Amos, D., Mirza, M., Ahuja, A., Barwinska, A. G., Rae, J., Mirowski, P., Leibo, J. Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hassabis, D., & Lillicrap, T. (2018). Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv: 1803.10760. – reference: Feinberg, V., Wan, A., Stoica, I., Jordan, M. I., Gonzalez, J. E., & Levine, S. (2018). Model-based value expansion for efficient model-free reinforcement learning. arXiv preprint arXiv: 1803.00101v1. – reference: Guss, W. H., Castro, M. Y., Devlin, S., Houghton, B., Kuno, N. S., Loomis, C., Milani, S., Mohanty, S., Nakata, K., Salakhutdinov, R., Schulman, J., Shiroshita, S., Topin, N., Ummadisingu, A., & Vinyals, O. (2021). NeurIPS 2020 Competition : The MineRL competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:2101.11071. – reference: Kirsch, L., Steenkiste, S. Van, & Schmidhuber, J. (2020). Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098. – reference: Marbach, P., Mihatsch, O., & Tsitsiklis, J. N. (1998). Call admission control and routing in integrated services networks using reinforcement learning. In – reference: (pp. 1–13). – volume: 64 start-page: 81 year: 2022 end-page: 93 ident: b0620 article-title: Graph neural network and multi-agent reinforcement learning for machine-process-system integrated control to optimize production yield publication-title: Journal of Manufacturing Systems – reference: Srinivas, A., Jabri, A., Abbeel, P., Levine, S., & Finn, C. (2018). Universal planning networks. arXiv preprint arXiv:1804.00645. – reference: (pp. 1048–1056). – reference: Wahlström, N., Schön, T. B., & Deisenroth, M. P. (2015). From pixels to torques: Policy learning with deep dynamical models. arXiv preprint arXiv: 1502.02251. – volume: 40 start-page: 75 year: 2023 end-page: 101 ident: b0875 article-title: Deep reinforcement learning in smart manufacturing: A review and prospects publication-title: CIRP Journal of Manufacturing Science and Technology – reference: Abdoos, M., Mozayani, N., & Bazzan, A. L. C. (2011). Traffic light control in non-stationary environments based on multi agent Q-learning. In – volume: 95 start-page: 103869 year: 2020 ident: b0750 article-title: Reinforcement learning for quadrupedal locomotion with design of continual–hierarchical curriculum publication-title: Engineering Applications of Artificial Intelligence – volume: 8 start-page: 293 year: 1992 end-page: 321 ident: b0895 article-title: Self-improving reactive agents based on reinforcement learning, planning and teaching publication-title: Machine Learning – reference: (pp. 1–14). – start-page: 11 year: 1975 end-page: 13 ident: b0740 article-title: A comparison of natural and artificial intelligence publication-title: ACM SIGART Bulletin – reference: Elmo: Computer Shogi Association, Results of the 27th world computer shogi championship. (2023). Retrieved from http://www2.computer-shogi.org/wcsc27/index_e.html. Accessed March 10, 2023. – volume: 521 start-page: 436 year: 2015 end-page: 444 ident: b0825 article-title: Deep learning publication-title: Nature – reference: Foerster, J. N., Assael, Y. M., De Freitas, N., & Whiteson, S. (2016a). Learning to communicate with deep multi-agent reinforcement learning. In – reference: Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I, Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., & Legg, S. (2017). Noisy networks for exploration. arXiv preprint arXiv: 1706.10295v3. – volume: 12 start-page: 875 year: 2001 end-page: 889 ident: b1050 article-title: Learning to trade via direct reinforcement publication-title: IEEE Transactions on Neural Network – volume: 191 start-page: 116285 year: 2022 ident: b1780 article-title: A distributed real-time pricing strategy based on reinforcement learning approach for smart grid publication-title: Expert Systems With Applications – volume: 45 start-page: 2673 year: 1997 end-page: 2681 ident: b1375 article-title: Bidirectional recurrent neural networks publication-title: IEEE Transactions on Signal Processing – volume: 22 start-page: 1 year: 2022 end-page: 29 ident: b0375 article-title: A novel reinforcement learning collision avoidance algorithm for usvs based on maneuvering characteristics and COLREGs publication-title: Sensors – volume: 27 start-page: 1011 year: 2022 end-page: 1022 ident: b0235 article-title: A learning-based vehicle trajectory-tracking approach for autonomous vehicles with lidar failure under various lighting conditions publication-title: IEEE/ASME Transactions on Mechatronics – volume: 71 start-page: 2511 year: 2022 end-page: 2526 ident: b1065 article-title: Bio-inspired collision avoidance in swarm systems via deep reinforcement learning publication-title: IEEE Transactions on Vehicular Technology – reference: Matignon, L., Laurent, G. J., & Fort-piat, N. Le. (2007). Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In – volume: 11 start-page: 11 year: 1997 end-page: 73 ident: b0080 article-title: Locally Weighted Learning publication-title: Artificial Intelligence Review – volume: 8 start-page: 229 year: 1992 end-page: 256 ident: b1705 article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning publication-title: Machine Learning – volume: 38 start-page: 126 year: 2019 end-page: 145 ident: b0775 article-title: SWIRL : A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards publication-title: The International Journal of Robotics Research – reference: (pp. 314–323). – reference: (pp. 5900–5907). – start-page: 650 year: 2007 end-page: 657 ident: b0685 article-title: Batch reinforcement learning in a complex domain publication-title: Proceedings of the 6th International Joint Conference On Autonomous Agents And Multiagent Systems – reference: (pp. 3652-3661). – reference: (pp. 1008–1014). – volume: 9 start-page: 3259 year: 2018 end-page: 3269 ident: b0285 article-title: Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control publication-title: IEEE Transactions on Smart Grid – volume: 8 start-page: 323 year: 1999 end-page: 338 ident: b1275 article-title: Concepts and facilities of a neural reinforcement learning control architecture for technical process control publication-title: Neural Computing and Applications – reference: Agrawal, S. & Jia, R. (2017). Optimistic posterior sampling for reinforcement learning: worst-case regret bounds. In – reference: (pp. 202–211). – volume: 49 start-page: 161 year: 2002 end-page: 178 ident: b1145 article-title: Kernel-based reinforcement learning publication-title: Machine Learning – volume: 59 start-page: 3166 year: 2019 end-page: 3176 ident: b1475 article-title: Deep reinforcement learning for multiparameter optimization in de novo drug design publication-title: Journal of Chemical Information and Modeling – reference: Kalweit, G., & Boedecker, J. (2017). Uncertainty driven imagination for continuous deep reinforcement learning. In – volume: 78 start-page: 236 year: 2019 end-page: 247 ident: b1165 article-title: Reinforcement learning based compensation methods for robot manipulators publication-title: Engineering Applications of Artificial Intelligence – reference: Kapturowski, S., Ostrovski, G., Quan, J., Munos, R., & Dabney, W. (2019). Recurrent experience replay in distributed reinforcement learning. In – reference: Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In – reference: (pp. 201–208). – reference: Gao, Y., Xu, H., Lin, Ji., Yu, F., Levine, S., & Darrell, T. (2018). Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv: 1802.05313. – reference: (pp. 4171–4186). – reference: Hasselt, H. V. (2010). Double Q-learning. In – volume: 33 start-page: 2045 year: 2022 end-page: 2056 ident: b0800 article-title: Deep reinforcement learning with modulated Hebbian plus Q-network architecture publication-title: IEEE Transactions on Neural Networks and Learning Systems – reference: Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P. P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H, P. d. O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., & Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680. – volume: 21 start-page: 3133 year: 2019 end-page: 3174 ident: b0950 article-title: Applications of deep reinforcement learning in communications and networking: A survey publication-title: IEEE Communications Surveys and Tutorials – volume: 8 start-page: 176598 year: 2020 end-page: 176623 ident: b0710 article-title: A systematic review on reinforcement learning-based robotics within the last decade publication-title: IEEE Access – reference: Salakhutdinov, R., & Hinton, G. (2009). Deep Boltzmann Machines. In – volume: 4 start-page: 217 year: 1981 end-page: 246 ident: b1530 article-title: An adaptive network that constructs and uses an internal model of its world publication-title: Cognition and Brain Theory – reference: (pp. 1607-1612) – volume: 8 start-page: 8557 year: 2021 end-page: 8569 ident: b0380 article-title: Distributed deep reinforcement learning for renewable energy accommodation assessment with communication uncertainty in internet of energy publication-title: IEEE Internet Of Things Journal – reference: Scholl, P., Dietrich, F., Otte, C., & Udluft, S. (2023). Safe policy improvement approaches and their limitations. In – start-page: 1054 year: 2016 end-page: 1062 ident: b1060 article-title: Safe and efficient off-policy reinforcement learning publication-title: Proceedings of the 30th Conference on Neural Advances in Neural Information Processing Systems – reference: Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. de Las, Budden, D., Abdolmaleki, A., Merel, J., Lefrancq, A., Lillicrap, T., & Riedmiller, M. (2018). DeepMind Control Suite. arXiv preprint arXiv:1801.00690. – reference: (pp. 2085–2087). – volume: 602 start-page: 328 year: 2022 end-page: 350 ident: b1465 article-title: AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient publication-title: Information Sciences – start-page: 2863 year: 2015 end-page: 2871 ident: b1130 article-title: Action-conditional video prediction using deep networks in Atari games publication-title: 28th International Conference on Neural Information Processing Systems – volume: 378 start-page: 1092 year: 2022 end-page: 1097 ident: b0855 article-title: Competition-level code generation with AlphaCode publication-title: Science – start-page: 1 year: 2022 end-page: 14 ident: b0325 article-title: Target-value-competition-based multi-agent deep reinforcement learning algorithm for distributed nonconvex economic dispatch publication-title: IEEE Transactions on power systems – volume: 518 start-page: 529 year: 2015 end-page: 533 ident: b1040 article-title: Human-level control through deep reinforcement learning publication-title: Nature – reference: S., & McFall, J. (2013). Concurrent reinforcement learning from customer interactions. In – reference: (pp. 1889–1897). – volume: 65 start-page: 87 year: 2017 end-page: 98 ident: b0570 article-title: Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies publication-title: Engineering Applications of Artificial Intelligence – volume: 610 start-page: 47 year: 2022 end-page: 53 ident: b0390 article-title: Discovering faster matrix multiplication algorithms with reinforcement learning publication-title: Nature – reference: Nazari, M., Oroojlooy, A., Snyder, L. V., & Takáč, M. (2018). Reinforcement learning for solving the vehicle routing problem. arXiv preprint arXiv:1802.04240. – reference: Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-critic algorithms. In – start-page: 2681 year: 2017 end-page: 2690 ident: b1140 article-title: Deep decentralized multi-task multi-agent reinforcement learning under partial observability publication-title: Proceedings of the 34th International Conference on Machine Learning – reference: . Lecture Notes in Computer Science, vol 7569. Springer, Berlin, Heidelberg. – reference: Beattie, C., Leibo, J. Z., Teplyashin, D., Ward, T., Wainwright, M., Küttler, H., Lefrancq, A., Green, S., Valdés, V., Sadik, A., Schrittwieser, J., Anderson, K., York, S., Cant, M., Cain, A., Bolton, A., Gaffney, S., King, H., Hassabis, D., Legg, S., & Petersen, S. (2016). DeepMind Lab. arXiv preprint arXiv: 1612.03801. – reference: Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., & Levine, S. (2018b). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905. – volume: 134 start-page: 1 year: 2021 end-page: 15 ident: b0990 article-title: Reinforcement learning for combinatorial optimization: A survey publication-title: Computers & Operations Research – reference: Klopf, A. H. (1972). Brain function and adaptive systems: A heterostatic theory, Technical Report, Air Force Cambridge Research Labs Hanscom AFB MA. – volume: 8 start-page: 225945 year: 2020 end-page: 225956 ident: b0795 article-title: Coverage path planning for decomposition reconfigurable grid-maps using deep reinforcement learning based travelling salesman problem publication-title: IEEE Access – volume: 619 start-page: 930 year: 2023 end-page: 946 ident: b1660 article-title: Solving combinatorial optimization problems over graphs with BERT-Based deep reinforcement learning publication-title: Information Sciences – reference: Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In – reference: Fu, J., Kumar, A., Nachum, O., Tucker, G., & Levine, S. (2021). D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv: 2004.07219v4. – volume: 55 start-page: 589 year: 2019 end-page: 591 ident: b0985 article-title: Q-RTS : A real-time swarm intelligence based on multi-agent Q-learning publication-title: Electronics Letters – reference: (pp. 2001–2014). – reference: Lin, J., Chiu, H., & Gau, R. (2021). Decentralized planning-assisted deep reinforcement learning for collision and obstacle avoidance in UAV networks. In – volume: 331 start-page: 443 year: 2019 end-page: 457 ident: b1810 article-title: Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability publication-title: Neurocomputing – volume: 8 start-page: 208016 year: 2020 end-page: 208044 ident: b1255 article-title: Deep reinforcement learning for traffic signal control: A review publication-title: IEEE Access – year: 1982 ident: b0745 article-title: The hedonistic neuron: A theory of memory, learning, and intelligence – volume: 21 start-page: 363 year: 2006 end-page: 372 ident: b1100 article-title: Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX publication-title: Springer Tracts in Advanced Robotics – start-page: 3207 year: 2018 end-page: 3214 ident: b0575 article-title: Deep reinforcement learning that matters publication-title: Proceedings of the 32nd AAAI Conference on Artificial Intelligence – volume: 84 start-page: 109 year: 2011 end-page: 136 ident: b1400 article-title: Informing sequential clinical decision-making through reinforcement learning: An empirical study publication-title: Machine Learning – volume: 22 start-page: 123 year: 1996 end-page: 158 ident: b1455 article-title: Reinforcement learning with replacing eligibility traces publication-title: Machine Learning – volume: 7 start-page: 617 year: 2020 end-page: 626 ident: b0925 article-title: Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system publication-title: IEEE/CAA Journal of Automatica Sinica – reference: Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G., Reichert, D., Rabinowitz, N., Barreto, A., & Degris, T. (2017b). The predictron: End-to-end learning and planning. In – volume: 69 start-page: 8554 year: 2022 end-page: 8565 ident: b0935 article-title: Deep reinforcement learning-based demand response for smart facilities energy management publication-title: IEEE Transactions on Industrial Electronics – volume: 54 start-page: 1 year: 2021 end-page: 35 ident: b1175 article-title: Hierarchical reinforcement learning: A comprehensive survey publication-title: ACM Computing Survey – reference: Li, W., & Todorov, E. (2004). Iterative linear quadratic regulator design for nonlinear biological movement systems. In – reference: (pp. 1352–1361). – volume: 588 start-page: 604 year: 2020 end-page: 609 ident: b1350 article-title: Mastering Atari, Go, chess and shogi by planning with a learned model publication-title: Nature – volume: 57 start-page: 469 year: 2009 end-page: 483 ident: b0065 article-title: A survey of robot learning from demonstration publication-title: Robotics and Autonomous Systems – volume: 6 start-page: 236 year: 2019 end-page: 246 ident: b0255 article-title: Parallel planning: A new motion planning framework for autonomous driving publication-title: IEEE/CAA Journal of Automatica Sinica – year: 1996 ident: b0185 article-title: Neuro-dynamic programming – volume: 50 start-page: 119 year: 2020 end-page: 138 ident: b0090 article-title: From inverse optimal control to inverse reinforcement learning: A historical review publication-title: Annual Reviews in Control – reference: Barreto, A., Dabney, W., Munos, R., Hunt, J. J., Schaul, T., Van Hasselt, H., & Silver, D. (2017). Successor features for transfer in reinforcement learning. In – start-page: 1 year: 2019 end-page: 12 ident: b1005 article-title: Guided meta-policy search publication-title: Proceedings of the 33rd Conference on Neural Information Processing Systems 32 – start-page: 312 year: 1996 end-page: 317 ident: b0530 article-title: Adapting arbitrary normal mutation distributions in evolution strategies: The covariancematrix adaptation publication-title: Proceedings of the IEEE International Conference on Evolutionary Computation – reference: Jiang, R., Zahavy, T., Xu, Z., White, A., Hessel, M., Blundell, C., & Hasselt, H. Van. (2021). Emphatic algorithms for deep reinforcement learning. In – reference: Devlin, J., Chang, M., Kenton, L., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In – reference: Laroche, R., Trichelair, P., & Combes, R. T. D. (2019). Safe policy improvement with baseline bootstrapping. In – volume: 23 start-page: 740 year: 2022 end-page: 759 ident: b0060 article-title: Survey of deep reinforcement learning for motion planning of autonomous vehicles publication-title: IEEE Transactions On Intelligent Transportation Systems – reference: Nadjahi, K., Laroche, R., & Combes, R. T. (2019). Safe policy improvement with soft baseline bootstrapping. arXiv preprint arXiv: 1907.05079v1. – reference: (pp. 7304-7312). – reference: Azar, M. G., Osband, I., & Munos, R. (2017). Minimax regret bounds for reinforcement learning. In – reference: Farahmand, A. M., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. (2008). Regularized policy iteration. In – reference: (pp. 1204–1212). – reference: . – reference: Riedmiller, M., Hafner, R., Lampe, T., Neunert, M., Degrave, J., Van De Wiele, T., & Springenberg, T. (2018). Learning by playing - Solving sparse reward tasks from scratch. In – reference: Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot. M., Sonnerat. N., Leibo. J. Z., Tuyls. K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning. In – reference: Bakhtin, A., Wu, D. J., Lerer, A., Gray, J., Jacob, A. P., Farina, G., Miller, A. H., & Brown, N. (2022). Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning. arXiv preprint arXiv:2210.05492v1. – reference: Chaffre, T., Moras, J., Chan-Hon-Tong, A., & Marzat, J. (2020). Sim-to-real transfer with incremental environment complexity for reinforcement learning of depth-based robot navigation. In – start-page: 1 year: 2010 end-page: 7 ident: b0600 article-title: Multiobjective reinforcement learning for traffic signal control using vehicular ad hoc network publication-title: EURASIP Journal on Advances in Signal Processing – start-page: 1 year: 2015 end-page: 11 ident: b1090 article-title: Language understanding for text-based games using deep reinforcement learning publication-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing – reference: Lee, A. X., Nagabandi, A., Abbeel, P., & Levine, S. (2020). Stochastic latent actor-critic : Deep reinforcement learning with a latent variable model. In – reference: (pp. 3191-3199). – reference: Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2020). Multi-agent actor-critic for mixed cooperative-competitive environments. In – reference: (pp. 1184-1194). – reference: (pp. 295-300). – reference: Badia, A. P., Sprechmann, P., Vitvitskyi, A., Guo, D., Piot, B., Kapturowski, S., Tieleman, O., Arjovsky, M., Pritzel, A., Bolt, A., & Blundell, C. (2020b). Never give up : Learning directed exploration strategies. arXiv preprint arXiv:2002.06038. – reference: Ha, D., & Eck, D. (2017). A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477v4. – start-page: 5998 year: 2017 end-page: 6008 ident: b1615 article-title: Attention is all you need publication-title: Proceedings of the 31st International Conference on Neural Information Processing Systems – start-page: 305 year: 1989 end-page: 313 ident: b1215 article-title: ALVINN: An autonomous land vehicle in a neural network publication-title: Proceedings of the 1st International Conference on Advances in Neural Information Processing Systems – reference: 80 (pp. 1407–1416). – volume: 6 start-page: 5223 year: 2021 end-page: 5230 ident: b1310 article-title: Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning publication-title: IEEE Robotics and Automation Letters – reference: (pp. 1859–1864). – year: 1911 ident: b1590 article-title: Animal intelligence – reference: Pong, V. H., Nair, A., Smith, L., Huang, C., & Levine, S. (2022). Offline meta-reinforcement learning with online self-supervision. arXiv preprint arXiv: 2107.03974v4. – volume: 29 start-page: 2063 year: 2018 end-page: 2079 ident: b0965 article-title: Applications of deep learning and reinforcement learning to biological data publication-title: IEEE Transactions on Neural Networks and Learning Systems – reference: (pp. 1691-1696). – reference: Rashid, T., Farquhar, G., Peng, B., & Whiteson, S. (2020). Weighted QMIX : Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In – start-page: 3986 year: 2018 end-page: 3995 ident: b1160 article-title: Reinforcement learning with function-valued action spaces for partial differential equation control publication-title: Proceedings of the 35th International Conference on Machine Learning – volume: 50 start-page: 3826 year: 2020 end-page: 3839 ident: b1115 article-title: Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications publication-title: IEEE Transactions On Cybernetics – volume: 213 start-page: 1 year: 2023 end-page: 13 ident: b0905 article-title: REDRL: A review-enhanced deep reinforcement learning model for interactive recommendation publication-title: Expert Systems With Applications – volume: 159 start-page: 96 year: 2019 end-page: 109 ident: b0225 article-title: Adversarial environment reinforcement learning algorithm for intrusion detection publication-title: Computer Networks – reference: Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. In – volume: 243 start-page: 1 year: 2022 end-page: 10 ident: b1735 article-title: FusionSum: Abstractive summarization with sentence fusion and cooperative reinforcement learning publication-title: Knowledge-Based Systems – reference: Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., & Mordatch, I. (2019). Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv: 1909.07528. – start-page: 1 year: 2022 end-page: 21 ident: b0465 article-title: RLAS-BIABC: A reinforcement learning-based answer selection using the bert model boosted by an improved ABC algorithm publication-title: Computational Intelligence and Neuroscience – reference: Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., Hasselt, H. V., Silver, D., Lillicrap, T., Calderone, K., Keet, P., Brunasso, A., Lawrence, D., Ekermo, A., Repp, J., & Tsing, R. (2017). StarCraft II: A new challenge for reinforcement learning. arXiv preprint arXiv: 1708.04782. – reference: Peters, J., & Schaal, S. (2007). Applying the episodic natural actor-critic architecture to motor primitive learning. In – reference: (pp. 2961–2970). – volume: 27 start-page: 846 year: 2022 end-page: 857 ident: b1820 article-title: Rule-based reinforcement learning for efficient robot navigation with space reduction publication-title: IEEE/ASME Transactions on Mechatronics – volume: 199 start-page: 1 year: 2022 end-page: 32 ident: b1125 article-title: Reinforcement learning in urban network traffic signal control: A systematic literature review publication-title: Expert Systems With Applications – reference: (pp. 1787–1798). – volume: 388 start-page: 12 year: 2020 end-page: 23 ident: b1725 article-title: Integration of an actor-critic model and generative adversarial networks for a Chinese calligraphy robot publication-title: Neurocomputing – reference: Engel, Y., Mannor, S., & Ron, M. (2005). Reinforcement learning with Gaussian processes. In – volume: 15 start-page: 319 year: 2001 end-page: 350 ident: b0130 article-title: Infinite-horizon policy-gradient estimation publication-title: Journal of Artificial Intelligence Research – reference: (pp. 443–451). – volume: 10 start-page: 390 year: 1965 end-page: 398 ident: b1645 article-title: A heuristic approach to reinforcement learning control systems publication-title: IEEE Transactions on Automatic Control – volume: 1977 start-page: 25 year: 1977 end-page: 38 ident: b1700 article-title: Advanced forecasting methods for global crisis warning and models of intelligence publication-title: General Systems, XXI I – reference: Liu, F., & Qian, C. (2021). Prediction guided meta-learning for multi-objective reinforcement learning. In – start-page: 1 year: 2018 end-page: 14 ident: b1220 article-title: Temporal difference models: Model-free deep RL for model-based control publication-title: Proceedings of the 6th International Conference on Learning Representations (ICLR) – volume: 62 start-page: 104 year: 2016 end-page: 115 ident: b0345 article-title: Neural networks based reinforcement learning for mobile robots obstacle avoidance publication-title: Expert Systems With Applications – reference: (pp. 19–26). – volume: 8 start-page: 3075 year: 2021 end-page: 3087 ident: b1790 article-title: CDDPG: A deep-reinforcement-learning-based approach for electric vehicle charging control publication-title: IEEE Internet of Things Journal – reference: , – volume: 468 year: 2022 ident: b1775 article-title: Deep neural networks based temporal-difference methods for high-dimensional parabolic partial differential equations publication-title: Journal of Computational Physics – reference: (pp. 22–31). – reference: Bloembergen, D., Kaisers, M., & Tuyls, K. (2010). Lenient frequency adjusted Q-learning. In – reference: (pp. 1–21). – volume: 49 start-page: 8 year: 1961 end-page: 30 ident: b1025 article-title: Steps Toward Artificial Intelligence publication-title: Proceedings of the IRE – volume: 6 start-page: 503 year: 2005 end-page: 556 ident: b0360 article-title: Tree-based batch mode reinforcement learning publication-title: Journal of Machine Learning Research – volume: 5 start-page: 297 year: 1966 end-page: 303 ident: b1000 article-title: A survey of learning control systems publication-title: ISA Transactions – reference: Mnih, V., Badia, A. P., Mirza, L., Graves, A., Harley, T., Lillicrap, T. P., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In – volume: 12 start-page: 1057 year: 2000 end-page: 1063 ident: b1550 article-title: Policy gradient methods for reinforcement learning with function approximation publication-title: Advances in Neural Information Processing Systems – reference: (pp. 2587–2601). – volume: 61 start-page: 1848 year: 2013 end-page: 1862 ident: b0695 article-title: QD-Learning : A collaborative distributed strategy for multi-agent reinforcement learning through publication-title: IEEE Transactions on Signal Process – reference: Crites, R. H., & Barto, A. G. (1994). An actor / critic algorithm that equivalent to Q-learning. In – year: 2019 ident: b1235 article-title: Language models are unsupervised multitask learners – volume: 114 start-page: 1 year: 2022 end-page: 18 ident: b0015 article-title: Cyber-security and reinforcement learning — A brief survey publication-title: Engineering Applications of Artificial Intelligence – volume: 13 start-page: 103 year: 1993 end-page: 130 ident: b1055 article-title: Prioritized sweeping: Reinforcement learning with less data and less time publication-title: Machine Learning – volume: 7 start-page: 6638 year: 2022 end-page: 6645 ident: b0195 article-title: VesNet-RL: Simulation-based reinforcement learning for real-world US probe navigation publication-title: IEEE Robotics and Automation Letters – reference: (pp. 661-670). – reference: (pp. 449-458). – start-page: 1433 year: 2008 end-page: 1438 ident: b1585 article-title: Maximum entropy inverse reinforcement learning brian publication-title: Proceedings of the 23rd AAAI Conference on Artificial Intelligence – reference: Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient multi-agent deep reinforcement learning. In – reference: (pp. 1–20). – reference: Maei, H. R., Szepesvari, C., Bhatnagar, S., Precup, D., Silver, D., & Sutton, R. S. (2009). Convergent temporal-difference learning with arbitrary smooth function approximation. In – reference: Rummery, G. A., & Niranjan, M. (1994). On-Line Q-Learning using connectionist systems. In – volume: 31 start-page: 1573 year: 2022 end-page: 1586 ident: b0915 article-title: Video summarization through reinforcement with a 3D spatio-temporal U-net publication-title: IEEE Transactions on Image Processing – volume: 13 start-page: 227 year: 2000 end-page: 303 ident: b0320 article-title: Hierarchical reinforcement learning with the MAXQ value function decomposition publication-title: Journal of Artificial Intelligence Research – reference: Watkins, C. J. C. H. (1989). Learning from delayed rewards, King’s College Cambridge, Ph.D. thesis. – reference: Anderson, R. N., Boulanger, A., Powell, W. B., & Scott, W. (2011). Adaptive stochastic control for the smart grid. In – reference: (pp. 465-472). – volume: 46 start-page: 8 year: 2018 end-page: 28 ident: b0220 article-title: Reinforcement learning for control: Performance, stability, and deep approximators publication-title: Annual Reviews in Control – volume: 13 start-page: 3041 year: 2012 end-page: 3074 ident: b0815 article-title: Finite-sample analysis of least-squares policy iteration publication-title: Journal of Machine Learning Research – volume: 127 start-page: 282 year: 2019 end-page: 294 ident: b1395 article-title: Reinforcement learning –Overview of recent progress and implications for process control publication-title: Computers and Chemical Engineering – reference: (pp. 387-395). – start-page: 4246 year: 2016 end-page: 4247 ident: b0665 article-title: The malmo platform for artificial intelligence experimentation publication-title: Proceedings of the 25th International Joint Conference on Artificial Intelligence – reference: D., Weller, – start-page: 1725 year: 2014 end-page: 1732 ident: b0700 article-title: Large-scale video classification with convolutional neural networks publication-title: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition – volume: 9 start-page: 1 year: 2019 end-page: 19 ident: b1665 article-title: A text abstraction summary model based on BERT word embedding and reinforcement learning publication-title: Applied Sciences – reference: Mingshuo, N., Dongming, C., & Dongqi, W. (2022). Reinforcement learning on graph: A survey. arXiv preprint arXiv:2204.06127v3. – reference: (pp. 563-568). – reference: (pp. 2252–2260). – volume: 555 start-page: 604 year: 2018 end-page: 610 ident: b1380 article-title: Planning chemical syntheses with deep neural networks and symbolic AI publication-title: Nature – volume: 32 start-page: 1238 year: 2013 end-page: 1274 ident: b0755 article-title: Reinforcement learning in robotics: A survey publication-title: International Journal of Robotics Research – reference: (pp. 3682–3690). – reference: Rashid, T., Samvelyan, M., Witt, C. S. de, Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In – year: 1960 ident: b0605 article-title: Dynamic Programming and Markov Processes – reference: Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv: 1912.01603v3. – start-page: 663 year: 2000 end-page: 670 ident: b1105 article-title: Algorithms for inverse reinforcement learning publication-title: Proceedings of the 17th International Conference on Machine Learning – reference: Turing, A. (1948). Intelligent machinery: Report for National physical laboratory universal turing machine. – volume: 71 start-page: 1180 year: 2008 end-page: 1190 ident: b1200 article-title: Natural actor-critic publication-title: Neurocomputing – volume: 49 start-page: 337 year: 2019 end-page: 349 ident: b0865 article-title: Human-centered reinforcement learning: A survey publication-title: IEEE Transactions on Human-Machine Systems – reference: Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. arXiv preprint arXiv: 1708.02596v2. – reference: Stockfish: Strong open source chess engine. (2022). Retrieved from https://stockfishchess.org/. Accessed March 10, 2023. – volume: 6 start-page: S191 year: 2020 ident: b0045 article-title: Introduction to deep learning publication-title: MIT Course Number – reference: Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., & Mordatch, I. (2021). Decision transformer : Reinforcement learning via sequence modeling. arXiv preprint arXiv: 2106.01345. – volume: 45 start-page: 2471 year: 2009 end-page: 2482 ident: b0190 article-title: Natural actor-critic algorithms publication-title: Automatica – volume: 4 start-page: 1107 year: 2003 end-page: 1149 ident: b0805 article-title: Least-squares policy iteration publication-title: Journal of Machine Learning Research – reference: Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., & Botvinick, M. (2016a). Learning to reinforcement learn. arXiv preprint arXiv: 1611.05763v3. – reference: Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Hasselt, H., & Silver, D. (2018). Distributed prioritized experience replay. In – start-page: 448 year: 2015 end-page: 456 ident: b0630 article-title: Batch normalization: Accelerating deep network training by reducing internal covariate shift publication-title: 32nd International Conference on Machine Learning (ICML) – reference: Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In – volume: 22 start-page: 1 year: 2020 end-page: 13 ident: b0535 article-title: Entanglement classification via neural network quantum states publication-title: New Journal of Physics – reference: Hasselt, H. Van, Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-Learning. In – volume: 602 start-page: 298 year: 2022 end-page: 312 ident: b1675 article-title: A reinforcement learning level-based particle swarm optimization algorithm for large-scale optimization publication-title: Information Sciences – reference: Moerland, T. M., Broekens, J., Plaat, A., & Jonker., C. M. (2022). Model-based reinforcement learning: A Survey. arXiv preprint arXiv: 2006.16712v4. – reference: Peters, J., Mulling, K., & Altun, Y. (2010). Relative entropy policy search. I – reference: Paine, T. L., Paduraru, C., Michi, A., Gulcehre, C., Żołna, K., Novikov, A., Wang. Z., & Freitas, N. de. (2020). Hyperparameter selection for offline reinforcement learning. arXiv preprint arXiv:2007.09055. – volume: 70 start-page: 377 year: 2021 end-page: 380 ident: b0625 article-title: Integrated process-system modelling and control through graph neural network and reinforcement learning publication-title: CIRP Annals – reference: Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, D., & Blundell, C. (2020a). Agent57: Outperforming the atari human benchmark. arXiv preprint arXiv: 2003.13350v1. – reference: Swazinna, P., Udluft, S., & Runkler, T. (2021). Overcoming model bias for robust offline deep reinforcement learning. arXiv preprint arXiv:2008.05533v4. – volume: 10 start-page: 2133 year: 2009 end-page: 2136 ident: b1570 article-title: RL-Glue: Language-independent software for reinforcement-learning experiments publication-title: Journal of Machine Learning Research – volume: 67 start-page: 1 year: 2021 end-page: 9 ident: b0115 article-title: Deep neural network based missing data prediction of electrocardiogram signal using multiagent reinforcement learning publication-title: Biomedical Signal Processing and Control – reference: (pp. 195-206). – reference: Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In – reference: ( – reference: (pp. 448–455). – year: 2018 ident: b1620 article-title: Programmatically interpretable reinforcement learning publication-title: Proceedings of the 35th International Conference on Machine Learning (PMLR) – volume: 7 year: 2021 ident: b1245 article-title: Autonomous reinforcement learning agent for chemical vapor deposition synthesis of quantum materials publication-title: npj Computational Materials – volume: 73 start-page: 1 year: 2021 end-page: 20 ident: b1815 article-title: Deep reinforcement learning in medical imaging: A literature review publication-title: Medical Image Analysis – reference: (pp. 864–871). – volume: 9 start-page: 72661 year: 2021 end-page: 72669 ident: b0705 article-title: Reinforcing synthetic data for meticulous survival prediction of patients suffering from left ventricular systolic dysfunction publication-title: IEEE Access – volume: 55 start-page: 2733 year: 2022 end-page: 2819 ident: b0820 article-title: Deep reinforcement learning in computer vision: A comprehensive survey publication-title: Artificial Intelligence Review – reference: Chua, K., Calandra, R., McAllister, R., & Levine, S. (2018). Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In – reference: (pp. 263-272). – reference: Maes, F., Fonteneau, R., Wehenkel, L., & Ernst, D. (2012). Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, JG., Lenca, P., Petit, JM. (eds) – start-page: 5739 year: 2018 end-page: 5743 ident: b1755 article-title: Towards sample efficient reinforcement learning publication-title: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18) – reference: (pp. 1928–1937). – reference: Kidambi, R., Rajeswaran, A., Netrapalli, P., & Joachims, T. (2020). MOReL: Model-based offline reinforcement learning. In – reference: Fox, R., Pakman, A., & Tishby, N. (2016). Taming the noise in reinforcement learning via soft updates. In – reference: Scheikl, P. M., Gyenes, B., Davitashvili, T., Younis, R., Schulze, A., Muller-Stich, B. P., Neumann. G., & Mathis-Ullrich, F. (2021). Cooperative assistance in robotic surgery through multi-agent reinforcement learning. In – start-page: 10674 year: 2021 end-page: 10681 ident: b1745 article-title: Improving sample efficiency in model-free reinforcement learning from images publication-title: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21) – volume: 26 start-page: 3757 year: 2022 end-page: 3775 ident: b1635 article-title: An integrated network embedding with reinforcement learning for explainable recommendation publication-title: Soft Computing - A Fusion of Foundations, Methodologies and Applications – reference: Goodfellow, I. J., Pouget-abadie, J., Mirza, M., Xu, B., Warde-farley, D., Ozair, S., Courville. A., & Bengio, Y. (2014). Generative Adversarial Nets. In – start-page: 1861 year: 2018 end-page: 1870 ident: b0505 article-title: Soft actor-critic: Off-policy Maximum entropy deep reinforcement learning with a stochastic actor publication-title: Proceedings of the 35th International Conference on Machine Learning 5 – volume: 25 start-page: 176 year: 2021 end-page: 180 ident: b1500 article-title: Combining deep reinforcement learning with graph neural networks for optimal VNF placement publication-title: IEEE Communications Letters – volume: 49 start-page: 8 issue: 1 year: 1961 ident: 10.1016/j.eswa.2023.120495_b1025 article-title: Steps Toward Artificial Intelligence publication-title: Proceedings of the IRE doi: 10.1109/JRPROC.1961.287775 – start-page: 1433 year: 2008 ident: 10.1016/j.eswa.2023.120495_b1585 article-title: Maximum entropy inverse reinforcement learning brian – volume: 261 start-page: 1 year: 2014 ident: 10.1016/j.eswa.2023.120495_b1740 article-title: Reinforcement learning algorithms with function approximation: Recent advances and applications publication-title: Information Sciences doi: 10.1016/j.ins.2013.08.037 – ident: 10.1016/j.eswa.2023.120495_b1390 – volume: 12 start-page: 875 issue: 4 year: 2001 ident: 10.1016/j.eswa.2023.120495_b1050 article-title: Learning to trade via direct reinforcement publication-title: IEEE Transactions on Neural Network doi: 10.1109/72.935097 – volume: 8 start-page: 176598 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0710 article-title: A systematic review on reinforcement learning-based robotics within the last decade publication-title: IEEE Access doi: 10.1109/ACCESS.2020.3027152 – volume: 9 start-page: 1 issue: 21 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1665 article-title: A text abstraction summary model based on BERT word embedding and reinforcement learning publication-title: Applied Sciences doi: 10.3390/app9214701 – start-page: 7667 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0540 article-title: Learning with safety constraints: Sample complexity of reinforcement learning for constrained MDPs – volume: 191 start-page: 116285 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1780 article-title: A distributed real-time pricing strategy based on reinforcement learning approach for smart grid publication-title: Expert Systems With Applications doi: 10.1016/j.eswa.2021.116285 – volume: 13 start-page: 41 year: 2003 ident: 10.1016/j.eswa.2023.120495_b0125 article-title: Recent advances in hierarchical reinforcement learning publication-title: Discrete Event Dynamic Systems: Theory and Applications doi: 10.1023/A:1022140919877 – ident: 10.1016/j.eswa.2023.120495_b0385 – ident: 10.1016/j.eswa.2023.120495_b0660 – ident: 10.1016/j.eswa.2023.120495_b1325 doi: 10.1109/IJCNN.2007.4371212 – volume: 610 start-page: 47 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0390 article-title: Discovering faster matrix multiplication algorithms with reinforcement learning publication-title: Nature doi: 10.1038/s41586-022-05172-4 – ident: 10.1016/j.eswa.2023.120495_b1335 doi: 10.1109/IROS51168.2021.9636193 – ident: 10.1016/j.eswa.2023.120495_b1195 – ident: 10.1016/j.eswa.2023.120495_b1470 – volume: 134 start-page: 57 year: 2002 ident: 10.1016/j.eswa.2023.120495_b0230 article-title: Deep blue publication-title: Artificial Intelligence doi: 10.1016/S0004-3702(01)00129-1 – volume: 65 start-page: 87 year: 2017 ident: 10.1016/j.eswa.2023.120495_b0570 article-title: Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies publication-title: Engineering Applications of Artificial Intelligence doi: 10.1016/j.engappai.2017.07.005 – start-page: 3215 year: 2018 ident: 10.1016/j.eswa.2023.120495_b0580 article-title: Rainbow: Combining improvements in deep reinforcement learning – volume: 1977 start-page: 25 year: 1977 ident: 10.1016/j.eswa.2023.120495_b1700 article-title: Advanced forecasting methods for global crisis warning and models of intelligence publication-title: General Systems, XXI I – volume: 61 start-page: 1848 issue: 7 year: 2013 ident: 10.1016/j.eswa.2023.120495_b0695 article-title: QD-Learning : A collaborative distributed strategy for multi-agent reinforcement learning through publication-title: IEEE Transactions on Signal Process doi: 10.1109/TSP.2013.2241057 – volume: 159 start-page: 96 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0225 article-title: Adversarial environment reinforcement learning algorithm for intrusion detection publication-title: Computer Networks doi: 10.1016/j.comnet.2019.05.013 – ident: 10.1016/j.eswa.2023.120495_b0350 – year: 2016 ident: 10.1016/j.eswa.2023.120495_b0210 publication-title: OpenAI Gym. – volume: 32 start-page: 1238 issue: 11 year: 2013 ident: 10.1016/j.eswa.2023.120495_b0755 article-title: Reinforcement learning in robotics: A survey publication-title: International Journal of Robotics Research doi: 10.1177/0278364913495721 – ident: 10.1016/j.eswa.2023.120495_b0780 – ident: 10.1016/j.eswa.2023.120495_b1345 doi: 10.1007/978-3-031-22953-4_4 – ident: 10.1016/j.eswa.2023.120495_b0910 doi: 10.1155/2021/5300189 – volume: 42 start-page: 674 year: 1997 ident: 10.1016/j.eswa.2023.120495_b1595 article-title: An analysis of temporal-difference learning with function approximation publication-title: IEEE Transactions on Automatic Control doi: 10.1109/9.580874 – start-page: 1995 year: 2016 ident: 10.1016/j.eswa.2023.120495_b1670 article-title: Dueling network architectures for deep reinforcement learning – volume: 331 start-page: 443 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1810 article-title: Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability publication-title: Neurocomputing doi: 10.1016/j.neucom.2018.11.072 – ident: 10.1016/j.eswa.2023.120495_b0510 – volume: 8 start-page: 229 issue: 3 year: 1992 ident: 10.1016/j.eswa.2023.120495_b1705 article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning publication-title: Machine Learning doi: 10.1023/A:1022672621406 – volume: 114 start-page: 1 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0015 article-title: Cyber-security and reinforcement learning — A brief survey publication-title: Engineering Applications of Artificial Intelligence doi: 10.1016/j.engappai.2022.105116 – ident: 10.1016/j.eswa.2023.120495_b1630 – start-page: 5285 year: 2017 ident: 10.1016/j.eswa.2023.120495_b1720 article-title: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation – ident: 10.1016/j.eswa.2023.120495_b0975 – ident: 10.1016/j.eswa.2023.120495_b1515 – volume: 64 start-page: 81 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0620 article-title: Graph neural network and multi-agent reinforcement learning for machine-process-system integrated control to optimize production yield publication-title: Journal of Manufacturing Systems doi: 10.1016/j.jmsy.2022.05.018 – ident: 10.1016/j.eswa.2023.120495_b0545 – ident: 10.1016/j.eswa.2023.120495_b0995 doi: 10.1145/1390156.1390240 – volume: 69 start-page: 8554 issue: 8 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0935 article-title: Deep reinforcement learning-based demand response for smart facilities energy management publication-title: IEEE Transactions on Industrial Electronics doi: 10.1109/TIE.2021.3104596 – volume: 602 start-page: 298 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1675 article-title: A reinforcement learning level-based particle swarm optimization algorithm for large-scale optimization publication-title: Information Sciences doi: 10.1016/j.ins.2022.04.053 – volume: 8 start-page: 225945 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0795 article-title: Coverage path planning for decomposition reconfigurable grid-maps using deep reinforcement learning based travelling salesman problem publication-title: IEEE Access doi: 10.1109/ACCESS.2020.3045027 – ident: 10.1016/j.eswa.2023.120495_b0030 – ident: 10.1016/j.eswa.2023.120495_b0485 doi: 10.1007/978-3-319-71682-4_5 – ident: 10.1016/j.eswa.2023.120495_b0425 – volume: 22 start-page: 33 year: 1996 ident: 10.1016/j.eswa.2023.120495_b0205 article-title: Linear least-squares algorithms for temporal difference learning publication-title: Machine Learning doi: 10.1023/A:1018056104778 – volume: 9 start-page: 67259 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0610 article-title: Reward shaping based federated reinforcement learning publication-title: IEEE Access doi: 10.1109/ACCESS.2021.3074221 – ident: 10.1016/j.eswa.2023.120495_b0275 – start-page: 1 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1805 article-title: State representation learning for effective deep reinforcement learning – ident: 10.1016/j.eswa.2023.120495_b0200 – start-page: 2681 year: 2017 ident: 10.1016/j.eswa.2023.120495_b1140 article-title: Deep decentralized multi-task multi-agent reinforcement learning under partial observability – start-page: 2746 year: 2015 ident: 10.1016/j.eswa.2023.120495_b1685 article-title: Embed to control: A locally linear latent dynamics model for control from raw images – volume: 550 start-page: 354 issue: 7676 year: 2017 ident: 10.1016/j.eswa.2023.120495_b1425 article-title: Mastering the game of Go without human knowledge publication-title: Nature doi: 10.1038/nature24270 – ident: 10.1016/j.eswa.2023.120495_b0785 – volume: 67 start-page: 1 issue: 102508 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0115 article-title: Deep neural network based missing data prediction of electrocardiogram signal using multiagent reinforcement learning publication-title: Biomedical Signal Processing and Control – volume: 6 start-page: 679 issue: 5 year: 1957 ident: 10.1016/j.eswa.2023.120495_b0160 article-title: A Markovian decision process publication-title: Journal of Mathematics and Mechanics – start-page: 177 year: 2009 ident: 10.1016/j.eswa.2023.120495_b1610 article-title: A theoretical and empirical analysis of expected sarsa – volume: 22 start-page: 4550 issue: 7 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1075 article-title: A generative adversarial network enabled deep distributional reinforcement learning for transmission scheduling in internet of vehicles publication-title: IEEE Transactions on Intelligent Transportation Systems doi: 10.1109/TITS.2020.3033577 – ident: 10.1016/j.eswa.2023.120495_b0890 doi: 10.1109/VTC2021-Spring51267.2021.9448710 – volume: 182 start-page: 115127 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1460 article-title: Deep graph convolutional reinforcement learning for financial portfolio management – DeepPocket publication-title: Expert Systems With Applications doi: 10.1016/j.eswa.2021.115127 – ident: 10.1016/j.eswa.2023.120495_b0315 – volume: 88 start-page: 135 issue: 2 year: 1981 ident: 10.1016/j.eswa.2023.120495_b1535 article-title: Toward a modern theory of adaptive networks: Expectation and prediction publication-title: Psychological Review doi: 10.1037/0033-295X.88.2.135 – ident: 10.1016/j.eswa.2023.120495_b0245 doi: 10.5220/0009821603140323 – ident: 10.1016/j.eswa.2023.120495_b0420 – ident: 10.1016/j.eswa.2023.120495_b0310 – ident: 10.1016/j.eswa.2023.120495_b0120 – ident: 10.1016/j.eswa.2023.120495_b0635 – volume: 173 start-page: 114663 issue: 2 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1765 article-title: Reinforcement learning approach for resource allocation in humanitarian logistics publication-title: Expert Systems With Applications doi: 10.1016/j.eswa.2021.114663 – year: 2008 ident: 10.1016/j.eswa.2023.120495_b0560 – ident: 10.1016/j.eswa.2023.120495_b1510 doi: 10.1609/aaai.v34i04.6049 – volume: 13 start-page: 227 year: 2000 ident: 10.1016/j.eswa.2023.120495_b0320 article-title: Hierarchical reinforcement learning with the MAXQ value function decomposition publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.639 – volume: 50 start-page: 1 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0435 article-title: Applications of reinforcement learning for building energy efficiency control: A review publication-title: Journal of Building Engineering doi: 10.1016/j.jobe.2022.104165 – volume: 26 start-page: 674 issue: 5 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1825 article-title: Deep reinforcement learning based mobile robot navigation: A review publication-title: Tsinghua Science and Technology doi: 10.26599/TST.2021.9010012 – volume: 9 start-page: 5785 issue: 8 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0330 article-title: Trajectory design and access control for air – Ground coordinated communications system with multiagent deep reinforcement learning publication-title: IEEE Internet of Things Journal doi: 10.1109/JIOT.2021.3062091 – ident: 10.1016/j.eswa.2023.120495_b1085 – ident: 10.1016/j.eswa.2023.120495_b1360 – volume: 8 start-page: 3075 issue: 5 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1790 article-title: CDDPG: A deep-reinforcement-learning-based approach for electric vehicle charging control publication-title: IEEE Internet of Things Journal doi: 10.1109/JIOT.2020.3015204 – ident: 10.1016/j.eswa.2023.120495_b0395 – ident: 10.1016/j.eswa.2023.120495_b0240 – ident: 10.1016/j.eswa.2023.120495_b0410 doi: 10.1609/aaai.v32i1.11794 – volume: 38 start-page: 126 issue: 2–3 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0775 article-title: SWIRL : A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards publication-title: The International Journal of Robotics Research doi: 10.1177/0278364918784350 – start-page: 305 year: 1989 ident: 10.1016/j.eswa.2023.120495_b1215 article-title: ALVINN: An autonomous land vehicle in a neural network – ident: 10.1016/j.eswa.2023.120495_b1285 – volume: 619 start-page: 930 year: 2023 ident: 10.1016/j.eswa.2023.120495_b1660 article-title: Solving combinatorial optimization problems over graphs with BERT-Based deep reinforcement learning publication-title: Information Sciences doi: 10.1016/j.ins.2022.11.073 – volume: 62 start-page: 104 year: 2016 ident: 10.1016/j.eswa.2023.120495_b0345 article-title: Neural networks based reinforcement learning for mobile robots obstacle avoidance publication-title: Expert Systems With Applications doi: 10.1016/j.eswa.2016.06.021 – ident: 10.1016/j.eswa.2023.120495_b0400 – ident: 10.1016/j.eswa.2023.120495_b0735 – ident: 10.1016/j.eswa.2023.120495_b0830 – year: 2019 ident: 10.1016/j.eswa.2023.120495_b1235 – volume: 6 start-page: 503 year: 2005 ident: 10.1016/j.eswa.2023.120495_b0360 article-title: Tree-based batch mode reinforcement learning publication-title: Journal of Machine Learning Research – volume: 88 start-page: 103360 issue: 1 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0460 article-title: Teaching a humanoid robot to walk faster through safe reinforcement learning publication-title: Engineering Applications of Artificial Intelligence doi: 10.1016/j.engappai.2019.103360 – volume: 27 start-page: 846 issue: 2 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1820 article-title: Rule-based reinforcement learning for efficient robot navigation with space reduction publication-title: IEEE/ASME Transactions on Mechatronics doi: 10.1109/TMECH.2021.3072675 – start-page: 10674 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1745 article-title: Improving sample efficiency in model-free reinforcement learning from images – volume: 12 start-page: 1057 year: 2000 ident: 10.1016/j.eswa.2023.120495_b1550 article-title: Policy gradient methods for reinforcement learning with function approximation publication-title: Advances in Neural Information Processing Systems – volume: 16 start-page: 105 year: 2002 ident: 10.1016/j.eswa.2023.120495_b1450 article-title: Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.859 – start-page: 448 year: 2015 ident: 10.1016/j.eswa.2023.120495_b0630 article-title: Batch normalization: Accelerating deep network training by reducing internal covariate shift – volume: 47 start-page: 253 year: 2013 ident: 10.1016/j.eswa.2023.120495_b0145 article-title: The arcade learning environment: An evaluation platform for general agents publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.3912 – volume: 27 start-page: 1378 issue: 9 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0250 article-title: AgentGraph: Toward universal dialogue management with structured deep reinforcement learning publication-title: IEEE/ACM Transactions on Audio Speech and Language Processing doi: 10.1109/TASLP.2019.2919872 – volume: 50 start-page: 3826 issue: 9 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1115 article-title: Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications publication-title: IEEE Transactions On Cybernetics doi: 10.1109/TCYB.2020.2977374 – volume: 54 start-page: 1 issue: 5 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1175 article-title: Hierarchical reinforcement learning: A comprehensive survey publication-title: ACM Computing Survey doi: 10.1145/3453160 – ident: 10.1016/j.eswa.2023.120495_b1030 – ident: 10.1016/j.eswa.2023.120495_b0490 – start-page: 1 year: 2015 ident: 10.1016/j.eswa.2023.120495_b1090 article-title: Language understanding for text-based games using deep reinforcement learning – volume: 8 start-page: 8557 issue: 10 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0380 article-title: Distributed deep reinforcement learning for renewable energy accommodation assessment with communication uncertainty in internet of energy publication-title: IEEE Internet Of Things Journal doi: 10.1109/JIOT.2020.3046622 – ident: 10.1016/j.eswa.2023.120495_b0405 – volume: 6 start-page: 5223 issue: 3 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1310 article-title: Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning publication-title: IEEE Robotics and Automation Letters doi: 10.1109/LRA.2021.3071954 – ident: 10.1016/j.eswa.2023.120495_b0835 – volume: 183 start-page: 107575 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1505 article-title: Efficient flow migration for NFV with Graph-aware deep reinforcement learning publication-title: Computer Networks doi: 10.1016/j.comnet.2020.107575 – start-page: 1 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0465 article-title: RLAS-BIABC: A reinforcement learning-based answer selection using the bert model boosted by an improved ABC algorithm publication-title: Computational Intelligence and Neuroscience doi: 10.1155/2022/7839840 – volume: 468 issue: 2022 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1775 article-title: Deep neural networks based temporal-difference methods for high-dimensional parabolic partial differential equations publication-title: Journal of Computational Physics – start-page: 1 year: 2017 ident: 10.1016/j.eswa.2023.120495_b1650 article-title: Sample efficient actor-critic with experience replay – volume: 8 start-page: 171058 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0035 article-title: Reinforcement learning interpretation methods: A survey publication-title: IEEE Access doi: 10.1109/ACCESS.2020.3023394 – ident: 10.1016/j.eswa.2023.120495_b1035 – ident: 10.1016/j.eswa.2023.120495_b1190 doi: 10.1609/aaai.v24i1.7727 – ident: 10.1016/j.eswa.2023.120495_b1185 – year: 2018 ident: 10.1016/j.eswa.2023.120495_b1620 article-title: Programmatically interpretable reinforcement learning – volume: 22 start-page: 123 year: 1996 ident: 10.1016/j.eswa.2023.120495_b1455 article-title: Reinforcement learning with replacing eligibility traces publication-title: Machine Learning doi: 10.1023/A:1018012322525 – ident: 10.1016/j.eswa.2023.120495_b0980 doi: 10.1109/IROS.2007.4399095 – start-page: 1 year: 2016 ident: 10.1016/j.eswa.2023.120495_b1170 article-title: Actor-mimic deep multitask and transfer reinforcement learning – ident: 10.1016/j.eswa.2023.120495_b0495 – volume: 6 start-page: 236 issue: 1 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0255 article-title: Parallel planning: A new motion planning framework for autonomous driving publication-title: IEEE/CAA Journal of Automatica Sinica doi: 10.1109/JAS.2018.7511186 – ident: 10.1016/j.eswa.2023.120495_b0520 – ident: 10.1016/j.eswa.2023.120495_b0340 – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 10.1016/j.eswa.2023.120495_b1040 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – ident: 10.1016/j.eswa.2023.120495_b0675 – ident: 10.1016/j.eswa.2023.120495_b0945 – volume: 104 start-page: 104630 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0305 article-title: Vision-based robust control framework based on deep reinforcement learning applied to autonomous ground vehicles publication-title: Control Engineering Practice doi: 10.1016/j.conengprac.2020.104630 – volume: 78 start-page: 236 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1165 article-title: Reinforcement learning based compensation methods for robot manipulators publication-title: Engineering Applications of Artificial Intelligence doi: 10.1016/j.engappai.2018.11.006 – volume: 588 start-page: 604 issue: 7839 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1350 article-title: Mastering Atari, Go, chess and shogi by planning with a learned model publication-title: Nature doi: 10.1038/s41586-020-03051-4 – volume: 388 start-page: 12 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1725 article-title: Integration of an actor-critic model and generative adversarial networks for a Chinese calligraphy robot publication-title: Neurocomputing doi: 10.1016/j.neucom.2020.01.043 – volume: 55 start-page: 1 issue: 7 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0020 article-title: Reinforcement learning based recommender systems: A survey publication-title: ACM Computing Surveys doi: 10.1145/3543846 – ident: 10.1016/j.eswa.2023.120495_b0515 – start-page: 663 year: 2000 ident: 10.1016/j.eswa.2023.120495_b1105 article-title: Algorithms for inverse reinforcement learning – ident: 10.1016/j.eswa.2023.120495_b1415 – volume: 54 start-page: 3215 issue: 12 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0335 article-title: A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications publication-title: Artificial Intelligence Review – ident: 10.1016/j.eswa.2023.120495_b0175 – ident: 10.1016/j.eswa.2023.120495_b0450 – volume: 134 start-page: 1 issue: 1 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0990 article-title: Reinforcement learning for combinatorial optimization: A survey publication-title: Computers & Operations Research – ident: 10.1016/j.eswa.2023.120495_b1045 doi: 10.1561/9781638280576 – ident: 10.1016/j.eswa.2023.120495_b0940 doi: 10.1016/j.engappai.2022.104848 – ident: 10.1016/j.eswa.2023.120495_b0280 – volume: 15 start-page: 210 year: 1970 ident: 10.1016/j.eswa.2023.120495_b0430 article-title: Learning control systems—Review and outlook publication-title: IEEE Transactions on Automatic Control doi: 10.1109/TAC.1970.1099405 – year: 2016 ident: 10.1016/j.eswa.2023.120495_b1355 – volume: 7 start-page: 6638 issue: 3 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0195 article-title: VesNet-RL: Simulation-based reinforcement learning for real-world US probe navigation publication-title: IEEE Robotics and Automation Letters doi: 10.1109/LRA.2022.3176112 – ident: 10.1016/j.eswa.2023.120495_b0885 – ident: 10.1016/j.eswa.2023.120495_b0455 – ident: 10.1016/j.eswa.2023.120495_b0730 – volume: 86 start-page: 153 issue: 2 year: 2017 ident: 10.1016/j.eswa.2023.120495_b1210 article-title: Survey of model-based reinforcement learning: Applications on Robotics publication-title: Journal of Intelligent and Robotic Systems: Theory and Applications doi: 10.1007/s10846-017-0468-y – ident: 10.1016/j.eswa.2023.120495_b0025 – ident: 10.1016/j.eswa.2023.120495_b1070 – volume: 81 start-page: 15395 issue: 11 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0715 article-title: Deep reinforcement learning approach for manuscripts image classification and retrieval publication-title: Multimedia Tools and Applications doi: 10.1007/s11042-022-12572-1 – volume: 55 start-page: 2733 issue: 4 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0820 article-title: Deep reinforcement learning in computer vision: A comprehensive survey publication-title: Artificial Intelligence Review doi: 10.1007/s10462-021-10061-9 – start-page: 1 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1005 article-title: Guided meta-policy search – volume: 9 start-page: 3259 issue: 4 year: 2018 ident: 10.1016/j.eswa.2023.120495_b0285 article-title: Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control publication-title: IEEE Transactions on Smart Grid doi: 10.1109/TSG.2016.2629450 – volume: 71 start-page: 2511 issue: 3 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1065 article-title: Bio-inspired collision avoidance in swarm systems via deep reinforcement learning publication-title: IEEE Transactions on Vehicular Technology doi: 10.1109/TVT.2022.3145346 – volume: 33 start-page: 2045 issue: 5 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0800 article-title: Deep reinforcement learning with modulated Hebbian plus Q-network architecture publication-title: IEEE Transactions on Neural Networks and Learning Systems doi: 10.1109/TNNLS.2021.3110281 – ident: 10.1016/j.eswa.2023.120495_b1305 – volume: 34 start-page: 286 year: 1977 ident: 10.1016/j.eswa.2023.120495_b1710 article-title: An adaptive optimal controller for discrete-time markov environments publication-title: Information and Control doi: 10.1016/S0019-9958(77)90354-0 – volume: 27 start-page: 1011 issue: 2 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0235 article-title: A learning-based vehicle trajectory-tracking approach for autonomous vehicles with lidar failure under various lighting conditions publication-title: IEEE/ASME Transactions on Mechatronics doi: 10.1109/TMECH.2021.3077388 – volume: 21 start-page: 682 issue: 4 year: 2008 ident: 10.1016/j.eswa.2023.120495_b1205 article-title: Reinforcement learning of motor skills with policy gradients publication-title: Neural Networks doi: 10.1016/j.neunet.2008.02.003 – volume: 23 start-page: 740 issue: 2 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0060 article-title: Survey of deep reinforcement learning for motion planning of autonomous vehicles publication-title: IEEE Transactions On Intelligent Transportation Systems doi: 10.1109/TITS.2020.3024655 – year: 1960 ident: 10.1016/j.eswa.2023.120495_b0605 – volume: 50 start-page: 119 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0090 article-title: From inverse optimal control to inverse reinforcement learning: A historical review publication-title: Annual Reviews in Control doi: 10.1016/j.arcontrol.2020.06.001 – ident: 10.1016/j.eswa.2023.120495_b1135 – start-page: 3223 year: 2018 ident: 10.1016/j.eswa.2023.120495_b0585 article-title: Deep Q-learning from demonstrations – volume: 97 start-page: 5331 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1250 publication-title: Efficient off-policy meta-reinforcement learning via probabilistic context variables – start-page: 3207 year: 2018 ident: 10.1016/j.eswa.2023.120495_b0575 article-title: Deep reinforcement learning that matters – volume: 127 start-page: 282 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1395 article-title: Reinforcement learning –Overview of recent progress and implications for process control publication-title: Computers and Chemical Engineering doi: 10.1016/j.compchemeng.2019.05.029 – volume: 38 start-page: 58 issue: 3 year: 1995 ident: 10.1016/j.eswa.2023.120495_b1580 article-title: Temporal difference learning and TD-Gammon publication-title: Communication of the ACM doi: 10.1145/203330.203343 – volume: 4 start-page: 132 issue: 1 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1785 article-title: Energy-efficient scheduling for real-time systems based on deep Q-learning model publication-title: IEEE Transactions on Sustainable Computing doi: 10.1109/TSUSC.2017.2743704 – ident: 10.1016/j.eswa.2023.120495_b0920 – volume: 22 start-page: 1 issue: 6 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0375 article-title: A novel reinforcement learning collision avoidance algorithm for usvs based on maneuvering characteristics and COLREGs publication-title: Sensors doi: 10.3390/s22062099 – volume: 40 start-page: 935 issue: 4 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0880 article-title: GNN-based hierarchical deep reinforcement learning for NFV-oriented online resource orchestration in elastic optical DCIs publication-title: Journal of Lightwave Technology doi: 10.1109/JLT.2021.3125974 – volume: 40 start-page: 75 year: 2023 ident: 10.1016/j.eswa.2023.120495_b0875 article-title: Deep reinforcement learning in smart manufacturing: A review and prospects publication-title: CIRP Journal of Manufacturing Science and Technology doi: 10.1016/j.cirpj.2022.11.003 – volume: 13 start-page: 2935 issue: 4 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0265 article-title: Reinforcement learning for selective key applications in power systems: Recent advances and future challenges publication-title: IEEE Transactions On Smart Grid doi: 10.1109/TSG.2022.3154718 – volume: 538 start-page: 142 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1715 article-title: Adaptive stock trading strategies with deep reinforcement learning methods publication-title: Information Sciences doi: 10.1016/j.ins.2020.05.066 – ident: 10.1016/j.eswa.2023.120495_b0680 – volume: 8 start-page: 208992 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0075 article-title: Reinforcement learning techniques for optimal power control in grid-connected microgrids: A comprehensive review publication-title: IEEE Access doi: 10.1109/ACCESS.2020.3038735 – volume: 8 start-page: 293 year: 1992 ident: 10.1016/j.eswa.2023.120495_b0895 article-title: Self-improving reactive agents based on reinforcement learning, planning and teaching publication-title: Machine Learning doi: 10.1023/A:1022628806385 – ident: 10.1016/j.eswa.2023.120495_b0150 doi: 10.1609/aaai.v26i1.8321 – ident: 10.1016/j.eswa.2023.120495_b1080 doi: 10.1109/ICRA.2018.8463189 – ident: 10.1016/j.eswa.2023.120495_b0550 doi: 10.1609/aaai.v30i1.10295 – volume: 95 start-page: 103869 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0750 article-title: Reinforcement learning for quadrupedal locomotion with design of continual–hierarchical curriculum publication-title: Engineering Applications of Artificial Intelligence doi: 10.1016/j.engappai.2020.103869 – volume: 4 start-page: 1107 issue: 6 year: 2003 ident: 10.1016/j.eswa.2023.120495_b0805 article-title: Least-squares policy iteration publication-title: Journal of Machine Learning Research – volume: 7 start-page: 617 issue: 2 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0925 article-title: Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system publication-title: IEEE/CAA Journal of Automatica Sinica doi: 10.1109/JAS.2020.1003072 – ident: 10.1016/j.eswa.2023.120495_b1300 – volume: 596 start-page: 583 issue: 7873 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0670 article-title: Highly accurate protein structure prediction with AlphaFold publication-title: Nature doi: 10.1038/s41586-021-03819-2 – ident: 10.1016/j.eswa.2023.120495_b0840 – ident: 10.1016/j.eswa.2023.120495_b0135 – volume: 23 start-page: 4909 issue: 6 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0725 article-title: Deep reinforcement learning for autonomous driving: A survey publication-title: IEEE Transactions On Intelligent Transportation Systems doi: 10.1109/TITS.2021.3054625 – ident: 10.1016/j.eswa.2023.120495_b1020 – volume: 21 start-page: 3133 issue: 4 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0950 article-title: Applications of deep reinforcement learning in communications and networking: A survey publication-title: IEEE Communications Surveys and Tutorials doi: 10.1109/COMST.2019.2916583 – ident: 10.1016/j.eswa.2023.120495_b0365 – ident: 10.1016/j.eswa.2023.120495_b0640 – year: 1972 ident: 10.1016/j.eswa.2023.120495_b0170 – ident: 10.1016/j.eswa.2023.120495_b0860 doi: 10.1145/1772690.1772758 – volume: 49 start-page: 337 issue: 4 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0865 article-title: Human-centered reinforcement learning: A survey publication-title: IEEE Transactions on Human-Machine Systems doi: 10.1109/THMS.2019.2912447 – start-page: 312 year: 1996 ident: 10.1016/j.eswa.2023.120495_b0530 article-title: Adapting arbitrary normal mutation distributions in evolution strategies: The covariancematrix adaptation – start-page: 3986 year: 2018 ident: 10.1016/j.eswa.2023.120495_b1160 article-title: Reinforcement learning with function-valued action spaces for partial differential equation control – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 10.1016/j.eswa.2023.120495_b0590 article-title: Long Short-Term Memory publication-title: Neural Computation doi: 10.1162/neco.1997.9.8.1735 – year: 1994 ident: 10.1016/j.eswa.2023.120495_b1230 – ident: 10.1016/j.eswa.2023.120495_b1295 – volume: 45 start-page: 2471 issue: 11 year: 2009 ident: 10.1016/j.eswa.2023.120495_b0190 article-title: Natural actor-critic algorithms publication-title: Automatica doi: 10.1016/j.automatica.2009.07.008 – volume: 378 start-page: 1092 issue: 6624 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0855 article-title: Competition-level code generation with AlphaCode publication-title: Science doi: 10.1126/science.abq1158 – volume: 2 start-page: 137 year: 1968 ident: 10.1016/j.eswa.2023.120495_b1010 article-title: BOXES, An experiment in adaptive control publication-title: Machine Intelligence – ident: 10.1016/j.eswa.2023.120495_b1225 – volume: 20 start-page: 61 issue: 1 year: 2009 ident: 10.1016/j.eswa.2023.120495_b1320 article-title: The graph neural network model publication-title: IEEE Transactions on Neural Networks doi: 10.1109/TNN.2008.2005605 – year: 1996 ident: 10.1016/j.eswa.2023.120495_b0185 – ident: 10.1016/j.eswa.2023.120495_b0845 – ident: 10.1016/j.eswa.2023.120495_b1655 – volume: 521 start-page: 436 issue: 7553 year: 2015 ident: 10.1016/j.eswa.2023.120495_b0825 article-title: Deep learning publication-title: Nature doi: 10.1038/nature14539 – ident: 10.1016/j.eswa.2023.120495_b0415 – start-page: 3040 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0655 article-title: Social influence as intrinsic motivation for multi-agent deep reinforcement learning – start-page: 1054 year: 2016 ident: 10.1016/j.eswa.2023.120495_b1060 article-title: Safe and efficient off-policy reinforcement learning – volume: 11 start-page: 11 year: 1997 ident: 10.1016/j.eswa.2023.120495_b0080 article-title: Locally Weighted Learning publication-title: Artificial Intelligence Review doi: 10.1023/A:1006559212014 – ident: 10.1016/j.eswa.2023.120495_b1420 – ident: 10.1016/j.eswa.2023.120495_b0100 – year: 1982 ident: 10.1016/j.eswa.2023.120495_b0745 – ident: 10.1016/j.eswa.2023.120495_b1575 – volume: 8 start-page: 341 year: 1992 ident: 10.1016/j.eswa.2023.120495_b0300 article-title: The convergence of TD(λ) for general λ publication-title: Machine Learning doi: 10.1023/A:1022632907294 – volume: 139 start-page: 1 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1120 article-title: A review On reinforcement learning: Introduction and applications in industrial process control publication-title: Computers and Chemical Engineering doi: 10.1016/j.compchemeng.2020.106886 – ident: 10.1016/j.eswa.2023.120495_b0290 – start-page: 1 year: 2010 ident: 10.1016/j.eswa.2023.120495_b0600 article-title: Multiobjective reinforcement learning for traffic signal control using vehicular ad hoc network publication-title: EURASIP Journal on Advances in Signal Processing – ident: 10.1016/j.eswa.2023.120495_b0525 – ident: 10.1016/j.eswa.2023.120495_b0790 – year: 1998 ident: 10.1016/j.eswa.2023.120495_b1540 – ident: 10.1016/j.eswa.2023.120495_b0955 – volume: 22 start-page: 1 issue: 4 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0535 article-title: Entanglement classification via neural network quantum states publication-title: New Journal of Physics doi: 10.1088/1367-2630/ab783d – volume: 6 start-page: 355 issue: 4 year: 2014 ident: 10.1016/j.eswa.2023.120495_b0555 article-title: A neuroevolution approach to general atari game playing publication-title: IEEE Transactions on Computational Intelligence and AI in Games doi: 10.1109/TCIAIG.2013.2294713 – ident: 10.1016/j.eswa.2023.120495_b0445 – ident: 10.1016/j.eswa.2023.120495_b0720 – ident: 10.1016/j.eswa.2023.120495_b1485 doi: 10.1145/1143844.1143955 – ident: 10.1016/j.eswa.2023.120495_b1680 – ident: 10.1016/j.eswa.2023.120495_b0085 – start-page: 5739 year: 2018 ident: 10.1016/j.eswa.2023.120495_b1755 article-title: Towards sample efficient reinforcement learning – volume: 41 start-page: 256 issue: 314 year: 1950 ident: 10.1016/j.eswa.2023.120495_b1385 article-title: XXII. Programming a computer for playing chess publication-title: Philosophical Magazine and Journal of Science doi: 10.1080/14786445008521796 – volume: 199 start-page: 1 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1125 article-title: Reinforcement learning in urban network traffic signal control: A systematic literature review publication-title: Expert Systems With Applications doi: 10.1016/j.eswa.2022.116830 – volume: 40 start-page: 1721 year: 2013 ident: 10.1016/j.eswa.2023.120495_b1015 article-title: Neural network reinforcement learning for visual control of robot manipulator publication-title: Expert Systems With Applications doi: 10.1016/j.eswa.2012.09.010 – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 10.1016/j.eswa.2023.120495_b1405 article-title: Mastering the game of Go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 – start-page: 5872 year: 2018 ident: 10.1016/j.eswa.2023.120495_b1795 article-title: Fully decentralized multi-agent reinforcement learning with networked agents – start-page: 11 year: 1975 ident: 10.1016/j.eswa.2023.120495_b0740 article-title: A comparison of natural and artificial intelligence publication-title: ACM SIGART Bulletin doi: 10.1145/1045236.1045237 – ident: 10.1016/j.eswa.2023.120495_b1340 – ident: 10.1016/j.eswa.2023.120495_b1495 – volume: v1.3.5 start-page: 2013 year: 2013 ident: 10.1016/j.eswa.2023.120495_b1730 publication-title: TORCS, The open racing car simulator – ident: 10.1016/j.eswa.2023.120495_b1770 – volume: vol. 1 year: 2005 ident: 10.1016/j.eswa.2023.120495_b0180 – volume: 7 issue: 108 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1245 article-title: Autonomous reinforcement learning agent for chemical vapor deposition synthesis of quantum materials publication-title: npj Computational Materials – volume: 13 start-page: 3041 year: 2012 ident: 10.1016/j.eswa.2023.120495_b0815 article-title: Finite-sample analysis of least-squares policy iteration publication-title: Journal of Machine Learning Research – volume: 145 start-page: 271 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1490 article-title: Reinforcement learning and its connections with neuroscience and psychology publication-title: Neural Networks doi: 10.1016/j.neunet.2021.10.003 – year: 1927 ident: 10.1016/j.eswa.2023.120495_b1180 – volume: 10 start-page: 390 year: 1965 ident: 10.1016/j.eswa.2023.120495_b1645 article-title: A heuristic approach to reinforcement learning control systems publication-title: IEEE Transactions on Automatic Control doi: 10.1109/TAC.1965.1098193 – volume: 112 start-page: 181 issue: 1999 year: 1999 ident: 10.1016/j.eswa.2023.120495_b1555 article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning publication-title: Artificial Intelligence doi: 10.1016/S0004-3702(99)00052-1 – ident: 10.1016/j.eswa.2023.120495_b0140 – ident: 10.1016/j.eswa.2023.120495_b0440 – ident: 10.1016/j.eswa.2023.120495_b0010 – ident: 10.1016/j.eswa.2023.120495_b0765 – volume: 3 start-page: 210 issue: 3 year: 1959 ident: 10.1016/j.eswa.2023.120495_b1315 article-title: Some studies in machine learning using the game of Chekers publication-title: IBM Journal of Research and Development doi: 10.1147/rd.33.0210 – ident: 10.1016/j.eswa.2023.120495_b1280 doi: 10.1007/11564096_32 – volume: 555 start-page: 604 year: 2018 ident: 10.1016/j.eswa.2023.120495_b1380 article-title: Planning chemical syntheses with deep neural networks and symbolic AI publication-title: Nature doi: 10.1038/nature25978 – volume: 106 start-page: 104451 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1750 article-title: Quantum deep reinforcement learning for rotor side converter control of double-fed induction generator-based wind turbines publication-title: Engineering Applications of Artificial Intelligence doi: 10.1016/j.engappai.2021.104451 – ident: 10.1016/j.eswa.2023.120495_b0595 – ident: 10.1016/j.eswa.2023.120495_b0870 – volume: 31 start-page: 1573 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0915 article-title: Video summarization through reinforcement with a 3D spatio-temporal U-net publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2022.3143699 – volume: 55 start-page: 945 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1445 article-title: Reinforcement learning in robotic applications: A comprehensive survey publication-title: Artificial Intelligence Review doi: 10.1007/s10462-021-09997-9 – volume: 46 start-page: 8 year: 2018 ident: 10.1016/j.eswa.2023.120495_b0220 article-title: Reinforcement learning for control: Performance, stability, and deep approximators publication-title: Annual Reviews in Control doi: 10.1016/j.arcontrol.2018.09.005 – start-page: 4246 year: 2016 ident: 10.1016/j.eswa.2023.120495_b0665 article-title: The malmo platform for artificial intelligence experimentation – volume: 4 start-page: 217 issue: 3 year: 1981 ident: 10.1016/j.eswa.2023.120495_b1530 article-title: An adaptive network that constructs and uses an internal model of its world publication-title: Cognition and Brain Theory – volume: 11 start-page: 1563 issue: 4 year: 2010 ident: 10.1016/j.eswa.2023.120495_b0645 article-title: Near-optimal regret bounds for reinforcement learning publication-title: Journal of Machine Learning Research – start-page: 216 year: 1990 ident: 10.1016/j.eswa.2023.120495_b1525 article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming – ident: 10.1016/j.eswa.2023.120495_b1560 doi: 10.1016/j.engappai.2021.104366 – ident: 10.1016/j.eswa.2023.120495_b1640 – ident: 10.1016/j.eswa.2023.120495_b1365 – volume: 55 start-page: 589 issue: 10 year: 2019 ident: 10.1016/j.eswa.2023.120495_b0985 article-title: Q-RTS : A real-time swarm intelligence based on multi-agent Q-learning publication-title: Electronics Letters doi: 10.1049/el.2019.0244 – volume: 575 start-page: 350 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1625 article-title: Grandmaster level in StarCraft II using multi-agent reinforcement learning publication-title: Nature doi: 10.1038/s41586-019-1724-z – volume: 6 start-page: S191 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0045 article-title: Introduction to deep learning publication-title: MIT Course Number – start-page: 1 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0325 article-title: Target-value-competition-based multi-agent deep reinforcement learning algorithm for distributed nonconvex economic dispatch publication-title: IEEE Transactions on power systems – ident: 10.1016/j.eswa.2023.120495_b0475 – start-page: 1 year: 2018 ident: 10.1016/j.eswa.2023.120495_b1220 article-title: Temporal difference models: Model-free deep RL for model-based control – start-page: 1725 year: 2014 ident: 10.1016/j.eswa.2023.120495_b0700 article-title: Large-scale video classification with convolutional neural networks – volume: 18 start-page: 2041 issue: 3 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1240 article-title: Modeling, detecting, and mitigating threats against industrial healthcare systems: A combined software defined networking and reinforcement learning approach publication-title: IEEE Transactions on Industrial Informatics doi: 10.1109/TII.2021.3093905 – ident: 10.1016/j.eswa.2023.120495_b1480 – ident: 10.1016/j.eswa.2023.120495_b0355 doi: 10.1145/1102351.1102377 – ident: 10.1016/j.eswa.2023.120495_b0500 – start-page: 1 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0295 article-title: Distributed actor-critic algorithms for multiagent reinforcement learning over directed graphs publication-title: IEEE Transactions On Neural Networks and Learning Systems – ident: 10.1016/j.eswa.2023.120495_b0850 – start-page: 26 year: 2017 ident: 10.1016/j.eswa.2023.120495_b0070 article-title: Deep reinforcement learning: A brief survey publication-title: IEEE Signal Processing Magazine doi: 10.1109/MSP.2017.2743240 – volume: 602 start-page: 328 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1465 article-title: AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient publication-title: Information Sciences doi: 10.1016/j.ins.2022.04.017 – ident: 10.1016/j.eswa.2023.120495_b0930 – volume: 5 start-page: 27091 year: 2017 ident: 10.1016/j.eswa.2023.120495_b1110 article-title: System design perspective for human-level agents using deep reinforcement learning: A survey publication-title: IEEE Access doi: 10.1109/ACCESS.2017.2777827 – volume: 57 start-page: 469 issue: 5 year: 2009 ident: 10.1016/j.eswa.2023.120495_b0065 article-title: A survey of robot learning from demonstration publication-title: Robotics and Autonomous Systems doi: 10.1016/j.robot.2008.10.024 – ident: 10.1016/j.eswa.2023.120495_b1290 doi: 10.1007/978-3-319-24574-4_28 – volume: 5 start-page: 1143 issue: 2 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0040 article-title: Learning robust control policies for end-to-end autonomous driving from data-driven simulation publication-title: IEEE Robotics and Automation Letters doi: 10.1109/LRA.2020.2966414 – volume: 71 start-page: 1180 year: 2008 ident: 10.1016/j.eswa.2023.120495_b1200 article-title: Natural actor-critic publication-title: Neurocomputing doi: 10.1016/j.neucom.2007.11.026 – volume: 84 start-page: 109 issue: 1–2 year: 2011 ident: 10.1016/j.eswa.2023.120495_b1400 article-title: Informing sequential clinical decision-making through reinforcement learning: An empirical study publication-title: Machine Learning doi: 10.1007/s10994-010-5229-0 – ident: 10.1016/j.eswa.2023.120495_b0105 – ident: 10.1016/j.eswa.2023.120495_b0050 doi: 10.1109/JPROC.2011.2109671 – ident: 10.1016/j.eswa.2023.120495_b1150 – volume: 1 start-page: 228 issue: 3 year: 1958 ident: 10.1016/j.eswa.2023.120495_b0165 article-title: Dynamic programming and stochastic control processes publication-title: Information and Control doi: 10.1016/S0019-9958(58)80003-0 – ident: 10.1016/j.eswa.2023.120495_b1760 – ident: 10.1016/j.eswa.2023.120495_b0810 – volume: 45 start-page: 2673 issue: 11 year: 1997 ident: 10.1016/j.eswa.2023.120495_b1375 article-title: Bidirectional recurrent neural networks publication-title: IEEE Transactions on Signal Processing doi: 10.1109/78.650093 – volume: 21 start-page: 1 issue: 4 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0055 article-title: Reinforcement learning-based complete area coverage path planning for a modified htrihex robot publication-title: Sensors doi: 10.3390/s21041067 – ident: 10.1016/j.eswa.2023.120495_b0960 doi: 10.1007/978-3-642-33492-4_6 – ident: 10.1016/j.eswa.2023.120495_b1440 – ident: 10.1016/j.eswa.2023.120495_b0470 – volume: 49 start-page: 161 year: 2002 ident: 10.1016/j.eswa.2023.120495_b1145 article-title: Kernel-based reinforcement learning publication-title: Machine Learning doi: 10.1023/A:1017928328829 – volume: 213 start-page: 1 year: 2023 ident: 10.1016/j.eswa.2023.120495_b0905 article-title: REDRL: A review-enhanced deep reinforcement learning model for interactive recommendation publication-title: Expert Systems With Applications doi: 10.1016/j.eswa.2022.118926 – volume: 243 start-page: 1 issue: 108483 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1735 article-title: FusionSum: Abstractive summarization with sentence fusion and cooperative reinforcement learning publication-title: Knowledge-Based Systems – volume: 13 start-page: 103 issue: 1 year: 1993 ident: 10.1016/j.eswa.2023.120495_b1055 article-title: Prioritized sweeping: Reinforcement learning with less data and less time publication-title: Machine Learning doi: 10.1023/A:1022635613229 – ident: 10.1016/j.eswa.2023.120495_b0650 – volume: 21 start-page: 363 year: 2006 ident: 10.1016/j.eswa.2023.120495_b1100 article-title: Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX publication-title: Springer Tracts in Advanced Robotics doi: 10.1007/11552246_35 – volume: 55 start-page: 895 year: 2022 ident: 10.1016/j.eswa.2023.120495_b0480 article-title: Multi-agent deep reinforcement learning: A survey publication-title: Artificial Intelligence Review doi: 10.1007/s10462-021-09996-w – ident: 10.1016/j.eswa.2023.120495_b1260 – ident: 10.1016/j.eswa.2023.120495_b1270 doi: 10.7551/mitpress/9816.003.0050 – volume: 9 start-page: 72661 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0705 article-title: Reinforcing synthetic data for meticulous survival prediction of patients suffering from left ventricular systolic dysfunction publication-title: IEEE Access doi: 10.1109/ACCESS.2021.3080617 – ident: 10.1016/j.eswa.2023.120495_b0110 – ident: 10.1016/j.eswa.2023.120495_b1155 – ident: 10.1016/j.eswa.2023.120495_b1430 – ident: 10.1016/j.eswa.2023.120495_b0095 – ident: 10.1016/j.eswa.2023.120495_b0370 – year: 2018 ident: 10.1016/j.eswa.2023.120495_b1545 – ident: 10.1016/j.eswa.2023.120495_b1690 – volume: 3 start-page: 72 year: 1978 ident: 10.1016/j.eswa.2023.120495_b1520 article-title: Single channel theory: A neuronal theory of learning publication-title: Brain Theory Newsletter – volume: 18 start-page: 2936 issue: 12 year: 2006 ident: 10.1016/j.eswa.2023.120495_b1565 article-title: Learning tetris using the noisy cross-entropy method publication-title: Neural Computation doi: 10.1162/neco.2006.18.12.2936 – volume: 5 start-page: 297 year: 1966 ident: 10.1016/j.eswa.2023.120495_b1000 article-title: A survey of learning control systems publication-title: ISA Transactions – start-page: 650 year: 2007 ident: 10.1016/j.eswa.2023.120495_b0685 article-title: Batch reinforcement learning in a complex domain – ident: 10.1016/j.eswa.2023.120495_b0970 doi: 10.1109/CDC.1998.760738 – volume: 8 start-page: 208016 year: 2020 ident: 10.1016/j.eswa.2023.120495_b1255 article-title: Deep reinforcement learning for traffic signal control: A review publication-title: IEEE Access doi: 10.1109/ACCESS.2020.3034141 – volume: 22 start-page: 7208 issue: 11 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1800 article-title: A hybrid of deep reinforcement learning and local search for the vehicle routing problems publication-title: IEEE Transactions on Intelligent Transportation Systems doi: 10.1109/TITS.2020.3003163 – volume: 59 start-page: 3166 issue: 7 year: 2019 ident: 10.1016/j.eswa.2023.120495_b1475 article-title: Deep reinforcement learning for multiparameter optimization in de novo drug design publication-title: Journal of Chemical Information and Modeling doi: 10.1021/acs.jcim.9b00325 – volume: 26 start-page: 3757 issue: 8 year: 2022 ident: 10.1016/j.eswa.2023.120495_b1635 article-title: An integrated network embedding with reinforcement learning for explainable recommendation publication-title: Soft Computing - A Fusion of Foundations, Methodologies and Applications – ident: 10.1016/j.eswa.2023.120495_b1330 – volume: 73 start-page: 1 issue: 102193 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1815 article-title: Deep reinforcement learning in medical imaging: A literature review publication-title: Medical Image Analysis – volume: 15 start-page: 319 year: 2001 ident: 10.1016/j.eswa.2023.120495_b0130 article-title: Infinite-horizon policy-gradient estimation publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.806 – volume: 70 start-page: 377 issue: 1 year: 2021 ident: 10.1016/j.eswa.2023.120495_b0625 article-title: Integrated process-system modelling and control through graph neural network and reinforcement learning publication-title: CIRP Annals doi: 10.1016/j.cirp.2021.04.056 – volume: 42 start-page: 1143 issue: 4 year: 2003 ident: 10.1016/j.eswa.2023.120495_b0770 article-title: On actor-critic algorithms publication-title: SIAM Journal on Control and Optimization doi: 10.1137/S0363012901385691 – volume: 29 start-page: 2063 issue: 6 year: 2018 ident: 10.1016/j.eswa.2023.120495_b0965 article-title: Applications of deep learning and reinforcement learning to biological data publication-title: IEEE Transactions on Neural Networks and Learning Systems doi: 10.1109/TNNLS.2018.2790388 – ident: 10.1016/j.eswa.2023.120495_b1600 – ident: 10.1016/j.eswa.2023.120495_b0005 doi: 10.1109/ITSC.2011.6083114 – ident: 10.1016/j.eswa.2023.120495_b0760 doi: 10.1109/ROBOT.2004.1307456 – volume: 38 start-page: 156 issue: 2 year: 2008 ident: 10.1016/j.eswa.2023.120495_b0215 article-title: A comprehensive survey of multiagent reinforcement learning publication-title: IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews doi: 10.1109/TSMCC.2007.913919 – volume: 25 start-page: 176 issue: 1 year: 2021 ident: 10.1016/j.eswa.2023.120495_b1500 article-title: Combining deep reinforcement learning with graph neural networks for optimal VNF placement publication-title: IEEE Communications Letters doi: 10.1109/LCOMM.2020.3025298 – ident: 10.1016/j.eswa.2023.120495_b1265 – year: 1911 ident: 10.1016/j.eswa.2023.120495_b1590 – volume: 16 start-page: 221 issue: 3/4 year: 1956 ident: 10.1016/j.eswa.2023.120495_b0155 article-title: A Problem in the sequential design of experiments publication-title: The Indian Journal of Statistics – volume: 18 start-page: 6070 issue: 1 year: 2017 ident: 10.1016/j.eswa.2023.120495_b0270 article-title: Risk-constrained reinforcement learning with percentile risk criteria publication-title: The Journal of Machine Learning Research – ident: 10.1016/j.eswa.2023.120495_b0900 doi: 10.1109/CEC45853.2021.9504972 – ident: 10.1016/j.eswa.2023.120495_b1095 – ident: 10.1016/j.eswa.2023.120495_b1370 – volume: 10 start-page: 2133 year: 2009 ident: 10.1016/j.eswa.2023.120495_b1570 article-title: RL-Glue: Language-independent software for reinforcement-learning experiments publication-title: Journal of Machine Learning Research – start-page: 1 year: 2020 ident: 10.1016/j.eswa.2023.120495_b0615 article-title: GAN-based deep distributional reinforcement learning for resource management in network slicing – volume: 362 start-page: 1140 year: 2018 ident: 10.1016/j.eswa.2023.120495_b1410 article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play publication-title: Science doi: 10.1126/science.aar6404 – ident: 10.1016/j.eswa.2023.120495_b1605 – start-page: 5998 year: 2017 ident: 10.1016/j.eswa.2023.120495_b1615 article-title: Attention is all you need – start-page: 630 year: 2016 ident: 10.1016/j.eswa.2023.120495_b0565 article-title: Identity mappings in deep residual networks – volume: 8 start-page: 323 issue: 4 year: 1999 ident: 10.1016/j.eswa.2023.120495_b1275 article-title: Concepts and facilities of a neural reinforcement learning control architecture for technical process control publication-title: Neural Computing and Applications doi: 10.1007/s005210050038 – start-page: 1861 year: 2018 ident: 10.1016/j.eswa.2023.120495_b0505 article-title: Soft actor-critic: Off-policy Maximum entropy deep reinforcement learning with a stochastic actor – ident: 10.1016/j.eswa.2023.120495_b1695 – ident: 10.1016/j.eswa.2023.120495_b0260 – ident: 10.1016/j.eswa.2023.120495_b1435 – ident: 10.1016/j.eswa.2023.120495_b0690 – start-page: 2863 year: 2015 ident: 10.1016/j.eswa.2023.120495_b1130 article-title: Action-conditional video prediction using deep networks in Atari games |
| SSID | ssj0017007 |
| Score | 2.7444592 |
| SecondaryResourceType | review_article |
| Snippet | •RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 120495 |
| SubjectTerms | Deep Reinforcement Learning (DRL) Function approximation Reinforcement learning Stochastic optimal control |
| Title | Reinforcement learning algorithms: A brief survey |
| URI | https://dx.doi.org/10.1016/j.eswa.2023.120495 |
| Volume | 231 |
| WOSCitedRecordID | wos001046203600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELbKwoELbXmIV6scuK2CYjuOY26rCvo4oKoCaW9R4jgsryzahNe_Z_zK7lKKyoGLFUWxE3lGk5nxzPchtJeqMmYcwlQqYxnGlRIhePVFqFjCSwHeHM5TQzbBj4_T4VD8dpytjaET4HWdPjyIm3cVNdwDYevW2TeIu1sUbsA1CB1GEDuM_yX4P8qAoUqT9_OsEGf9_OpsPDlvR9eNbUYvIEau-s3t5G7-ZNdgH7cO4dn3vs2ccncpmVF--WiTss1IwyKZYu3O0moyI1Mo8F13ZOkE_d-FBLbMbDyym-BzD4R6zMOZJCIPY2x5drw9Jc6sW4uICcQg7EVjbfMGF_uqudcIUITuTx-eR8Z-9sfq6gh9idpFptfI9BqZXWMBLRLORNpDi4Ofh8Nf3ckSj2wLvf9y10hla_6ef8nLzsqMA3LyCa24yCEYWIl_Rh9UvYo-elaOwBnpNYTnFCDwChBMFeAgGARG_IEV_zo6PTo8-fYjdMQYoaRR1IZKlClPK5ZTSkSJE_Cyo1hhhSOREgkeJ5W0xKIqpCo55XEFXghjJpRnKqkI3UC9elyrTRSoqCqIBoUsYhyrhOS4TCSVXEglioTRLYT9DmTSocZr8pKr7N97v4X63Zwbi5ny6tPMb2zmvD7rzWWgJ6_M237TW3bQ8lSBd1GvndyqL2hJ3rXnzeSrU5InxPN2tQ |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reinforcement+learning+algorithms%3A+A+brief+survey&rft.jtitle=Expert+systems+with+applications&rft.au=Shakya%2C+Ashish+Kumar&rft.au=Pillai%2C+Gopinatha&rft.au=Chakrabarty%2C+Sohom&rft.date=2023-11-30&rft.issn=0957-4174&rft.volume=231&rft.spage=120495&rft_id=info:doi/10.1016%2Fj.eswa.2023.120495&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2023_120495 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon |