Reinforcement learning algorithms: A brief survey

•RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate goal of an RL agent is to maximize cumulative reward.•RL agent tries to learn the optimal value and policy functions.•DNN-based function appro...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 231; p. 120495
Main Authors: Shakya, Ashish Kumar, Pillai, Gopinatha, Chakrabarty, Sohom
Format: Journal Article
Language:English
Published: Elsevier Ltd 30.11.2023
Subjects:
ISSN:0957-4174, 1873-6793
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract •RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate goal of an RL agent is to maximize cumulative reward.•RL agent tries to learn the optimal value and policy functions.•DNN-based function approximation is used to approximate the value and policy. Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge obtained by continuous interaction with a stochastic dynamical environment. Problems considered virtually impossible to solve, such as learning to play video games just from pixel information, are now successfully solved using deep reinforcement learning. Without human intervention, RL agents can surpass human performance in challenging tasks. This review gives a broad overview of RL, covering its fundamental principles, essential methods, and illustrative applications. The authors aim to develop an initial reference point for researchers commencing their research work in RL. In this review, the authors cover some fundamental model-free RL algorithms and pathbreaking function approximation-based deep RL (DRL) algorithms for complex uncertain tasks with continuous action and state spaces, making RL useful in various interdisciplinary fields. This article also provides a brief review of model-based and multi-agent RL approaches. Finally, some promising research directions for RL are briefly presented.
AbstractList •RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate goal of an RL agent is to maximize cumulative reward.•RL agent tries to learn the optimal value and policy functions.•DNN-based function approximation is used to approximate the value and policy. Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge obtained by continuous interaction with a stochastic dynamical environment. Problems considered virtually impossible to solve, such as learning to play video games just from pixel information, are now successfully solved using deep reinforcement learning. Without human intervention, RL agents can surpass human performance in challenging tasks. This review gives a broad overview of RL, covering its fundamental principles, essential methods, and illustrative applications. The authors aim to develop an initial reference point for researchers commencing their research work in RL. In this review, the authors cover some fundamental model-free RL algorithms and pathbreaking function approximation-based deep RL (DRL) algorithms for complex uncertain tasks with continuous action and state spaces, making RL useful in various interdisciplinary fields. This article also provides a brief review of model-based and multi-agent RL approaches. Finally, some promising research directions for RL are briefly presented.
ArticleNumber 120495
Author Chakrabarty, Sohom
Pillai, Gopinatha
Shakya, Ashish Kumar
Author_xml – sequence: 1
  givenname: Ashish Kumar
  surname: Shakya
  fullname: Shakya, Ashish Kumar
  email: akumarshakya@ee.iitr.ac.in
– sequence: 2
  givenname: Gopinatha
  surname: Pillai
  fullname: Pillai, Gopinatha
  email: gn.pillai@ee.iitr.ac.in
– sequence: 3
  givenname: Sohom
  orcidid: 0000-0001-7213-6693
  surname: Chakrabarty
  fullname: Chakrabarty, Sohom
  email: sohom.chakrabarty@ee.iitr.ac.in
BookMark eNp9z81KAzEUhuEgFWyrN-BqbmDGk5-ZTMRNKVqFgiC6DmnmpKZMM5LESu_elrpy0dVZPYfvnZBRGAISckuhokCbu02F6cdUDBivKAOh6gsypq3kZSMVH5ExqFqWgkpxRSYpbQCoBJBjQt_QBzdEi1sMuejRxODDujD9eog-f27TfTErVtGjK9J33OH-mlw60ye8-btT8vH0-D5_Lpevi5f5bFlaDpBLVF0rW1cbzpnqaNPUAAIpUlAts0pxbnlHlVtZ7CSXwgFldd0KUKLGxjE-Jez018YhpYhOf0W_NXGvKehjtN7oY7Q-RutT9AG1_5D12WQ_hByN78_ThxPFQ9TOY9TJegyHeT6izbob_Dn-CxSgc0s
CitedBy_id crossref_primary_10_1016_j_eswa_2024_125116
crossref_primary_10_3390_app15052624
crossref_primary_10_3390_jimaging11020059
crossref_primary_10_3390_fi16120460
crossref_primary_10_1016_j_eswa_2024_124820
crossref_primary_10_3390_math13010173
crossref_primary_10_1109_ACCESS_2025_3590134
crossref_primary_10_1109_ACCESS_2024_3452190
crossref_primary_10_1016_j_enbuild_2025_116349
crossref_primary_10_1016_j_swevo_2024_101759
crossref_primary_10_1016_j_eswa_2025_127251
crossref_primary_10_1109_JSEN_2024_3483192
crossref_primary_10_1016_j_compeleceng_2025_110634
crossref_primary_10_3390_s24082461
crossref_primary_10_1051_itmconf_20257804023
crossref_primary_10_1016_j_ins_2024_120805
crossref_primary_10_1016_j_asoc_2025_112993
crossref_primary_10_3389_fspor_2024_1440652
crossref_primary_10_1007_s10586_024_04381_y
crossref_primary_10_3390_a17060269
crossref_primary_10_3390_jmse12122214
crossref_primary_10_3390_technologies12090163
crossref_primary_10_1007_s10586_025_05424_8
crossref_primary_10_1016_j_conengprac_2025_106491
crossref_primary_10_26599_BDMA_2025_9020036
crossref_primary_10_3390_su151813668
crossref_primary_10_1177_00131644251332972
crossref_primary_10_1109_TCSS_2024_3505205
crossref_primary_10_1007_s11071_024_10533_x
crossref_primary_10_3390_jrfm18070347
crossref_primary_10_1007_s10586_025_05232_0
crossref_primary_10_1061_JAEEEZ_ASENG_6129
crossref_primary_10_1016_j_engappai_2025_110091
crossref_primary_10_1016_j_cie_2025_111185
crossref_primary_10_1051_epjconf_202532601006
crossref_primary_10_1016_j_tws_2025_113756
crossref_primary_10_1016_j_iot_2025_101560
crossref_primary_10_1016_j_ress_2024_110466
crossref_primary_10_1109_TIP_2025_3592538
crossref_primary_10_1080_02508060_2025_2516299
crossref_primary_10_1002_nme_70110
crossref_primary_10_1016_j_eswa_2024_124951
crossref_primary_10_3389_fmars_2025_1629563
crossref_primary_10_1016_j_knosys_2025_114070
crossref_primary_10_1016_j_biosystems_2025_105457
crossref_primary_10_1049_cth2_70021
crossref_primary_10_3390_s24237787
crossref_primary_10_1016_j_rineng_2025_106739
crossref_primary_10_1016_j_oceaneng_2024_119304
crossref_primary_10_3390_app15063313
crossref_primary_10_1016_j_jmapro_2025_08_052
crossref_primary_10_1108_IR_01_2025_0015
crossref_primary_10_1016_j_cie_2025_110889
crossref_primary_10_1016_j_ymssp_2025_112770
crossref_primary_10_1080_00207543_2024_2428426
crossref_primary_10_1016_j_neunet_2025_107260
crossref_primary_10_3390_s25010211
crossref_primary_10_1007_s00170_024_13704_7
crossref_primary_10_1088_2058_9565_ad80c1
crossref_primary_10_3390_jmse12060998
crossref_primary_10_1016_j_jmsy_2024_06_009
crossref_primary_10_1007_s11705_024_2487_0
crossref_primary_10_1016_j_ijpe_2025_109601
crossref_primary_10_1093_bioadv_vbaf142
crossref_primary_10_3390_electronics14183711
crossref_primary_10_1049_sil2_6422115
crossref_primary_10_1007_s00170_024_13874_4
crossref_primary_10_1007_s40593_025_00494_6
crossref_primary_10_1109_TKDE_2025_3546686
crossref_primary_10_1016_j_segan_2024_101356
crossref_primary_10_3390_fi17010019
crossref_primary_10_1007_s10010_025_00814_1
crossref_primary_10_1016_j_eswa_2024_126365
crossref_primary_10_3390_en18195056
crossref_primary_10_4218_etrij_2024_0339
crossref_primary_10_1016_j_eswa_2024_125388
crossref_primary_10_3390_biomimetics10080497
crossref_primary_10_1016_j_renene_2024_121265
crossref_primary_10_3390_s24072035
crossref_primary_10_3390_biology13110923
crossref_primary_10_1016_j_eswa_2025_127740
crossref_primary_10_1016_j_vehcom_2025_100913
crossref_primary_10_3390_pr13072207
crossref_primary_10_1016_j_asoc_2025_113450
crossref_primary_10_3390_pr13061791
crossref_primary_10_1007_s43926_025_00092_x
crossref_primary_10_1007_s10115_024_02162_y
crossref_primary_10_3390_smartcities8010005
crossref_primary_10_3390_fluids10080193
crossref_primary_10_1080_17477778_2025_2549092
crossref_primary_10_1016_j_eswa_2025_128280
crossref_primary_10_1016_j_rcim_2024_102942
crossref_primary_10_3390_app15010179
crossref_primary_10_1016_j_egyai_2025_100521
crossref_primary_10_1016_j_jobe_2025_112626
crossref_primary_10_3390_robotics13110166
crossref_primary_10_1016_j_eswa_2024_125963
crossref_primary_10_32604_cmc_2024_056823
crossref_primary_10_1063_5_0272428
crossref_primary_10_1016_j_eswa_2025_128614
crossref_primary_10_1016_j_engappai_2025_110181
crossref_primary_10_1016_j_engappai_2024_108599
crossref_primary_10_3390_fermentation10120598
crossref_primary_10_1016_j_swevo_2025_102080
crossref_primary_10_1016_j_segan_2025_101785
crossref_primary_10_1051_jnwpu_20254310128
crossref_primary_10_1016_j_ins_2025_122514
crossref_primary_10_1016_j_bcra_2025_100387
crossref_primary_10_1016_j_ifacol_2024_09_176
crossref_primary_10_1016_j_jnca_2024_104092
crossref_primary_10_1111_risa_17599
crossref_primary_10_3390_drones9080521
crossref_primary_10_1038_s41598_025_98572_1
crossref_primary_10_1145_3663366
crossref_primary_10_1016_j_eswa_2025_127659
crossref_primary_10_3390_app15179435
crossref_primary_10_1145_3730848
crossref_primary_10_1007_s11432_025_4544_x
crossref_primary_10_1016_j_apenergy_2023_122029
crossref_primary_10_1109_ACCESS_2025_3607976
crossref_primary_10_1016_j_eswa_2024_126164
crossref_primary_10_1016_j_biotechadv_2024_108480
crossref_primary_10_3390_en18184995
crossref_primary_10_1016_j_jmrt_2025_07_005
crossref_primary_10_3390_aerospace12050411
crossref_primary_10_1007_s11227_024_06167_w
crossref_primary_10_1016_j_jobe_2025_114074
crossref_primary_10_1109_TSG_2024_3458074
crossref_primary_10_1007_s42835_024_02086_1
crossref_primary_10_1016_j_jatrs_2025_100077
crossref_primary_10_1007_s10489_024_06149_8
crossref_primary_10_1109_ACCESS_2024_3442445
crossref_primary_10_3390_fi16090343
crossref_primary_10_1002_widm_1548
crossref_primary_10_3390_rs17071175
crossref_primary_10_1109_ACCESS_2024_3444189
crossref_primary_10_1371_journal_pone_0320777
crossref_primary_10_1002_cta_4235
crossref_primary_10_1088_2515_7620_adf530
crossref_primary_10_1016_j_knosys_2025_114483
crossref_primary_10_1016_j_eswa_2024_125530
crossref_primary_10_1007_s00066_024_02272_0
crossref_primary_10_1177_14727978251380837
crossref_primary_10_3389_fbinf_2025_1633623
crossref_primary_10_1016_j_amc_2025_129685
crossref_primary_10_1038_s41598_025_89285_6
crossref_primary_10_1142_S0218126625503670
crossref_primary_10_3390_app14188383
crossref_primary_10_1039_D5DD00221D
crossref_primary_10_1109_ACCESS_2024_3355785
crossref_primary_10_1016_j_oceaneng_2023_116142
crossref_primary_10_12677_airr_2025_141014
crossref_primary_10_1016_j_cogsys_2025_101354
crossref_primary_10_3390_math13121999
crossref_primary_10_1007_s40747_023_01216_y
crossref_primary_10_1080_0305215X_2024_2434201
crossref_primary_10_3390_make6040135
crossref_primary_10_1016_j_eswa_2025_127457
crossref_primary_10_1051_itmconf_20257801007
crossref_primary_10_1109_TIM_2024_3450088
crossref_primary_10_1016_j_enconman_2023_117921
crossref_primary_10_3390_app15116215
crossref_primary_10_1177_03019233251359803
crossref_primary_10_1109_JIOT_2024_3498322
crossref_primary_10_1016_j_scs_2023_105065
crossref_primary_10_1109_TPDS_2025_3550531
crossref_primary_10_1016_j_eswa_2024_124580
crossref_primary_10_1063_5_0220766
crossref_primary_10_3390_jrfm18090497
crossref_primary_10_1049_cth2_12775
crossref_primary_10_3390_biomimetics10050341
crossref_primary_10_1016_j_neubiorev_2024_105877
crossref_primary_10_1038_s41534_025_01065_2
crossref_primary_10_1016_j_ijrefrig_2024_03_009
crossref_primary_10_1016_j_neucom_2024_129328
crossref_primary_10_1016_j_tifs_2025_105055
crossref_primary_10_1016_j_eswa_2024_126196
crossref_primary_10_1016_j_optlaseng_2024_108534
crossref_primary_10_3390_wevj15060246
crossref_primary_10_1016_j_est_2025_115496
crossref_primary_10_1016_j_comnet_2025_111270
crossref_primary_10_1007_s10922_025_09927_y
crossref_primary_10_1007_s10489_024_05933_w
crossref_primary_10_1016_j_engappai_2024_109858
crossref_primary_10_3390_en18174779
crossref_primary_10_1016_j_simpat_2025_103118
crossref_primary_10_3390_wevj15020039
crossref_primary_10_1121_10_0037186
crossref_primary_10_3390_electronics13183590
crossref_primary_10_3390_math13193055
crossref_primary_10_3390_electronics13132488
crossref_primary_10_1016_j_ymssp_2024_111473
crossref_primary_10_1016_j_autcon_2025_106129
crossref_primary_10_1016_j_chemosphere_2024_142223
crossref_primary_10_1140_epje_s10189_025_00513_3
crossref_primary_10_1016_j_icte_2024_05_001
crossref_primary_10_1007_s10846_023_02030_x
crossref_primary_10_3390_en18174783
crossref_primary_10_1016_j_ecmx_2025_101056
crossref_primary_10_1016_j_swevo_2025_101944
crossref_primary_10_1016_j_ymssp_2025_113029
crossref_primary_10_1109_ACCESS_2025_3559428
crossref_primary_10_1016_j_apenergy_2024_123923
crossref_primary_10_1109_TNNLS_2024_3440498
Cites_doi 10.1109/JRPROC.1961.287775
10.1016/j.ins.2013.08.037
10.1109/72.935097
10.1109/ACCESS.2020.3027152
10.3390/app9214701
10.1016/j.eswa.2021.116285
10.1023/A:1022140919877
10.1109/IJCNN.2007.4371212
10.1038/s41586-022-05172-4
10.1109/IROS51168.2021.9636193
10.1016/S0004-3702(01)00129-1
10.1016/j.engappai.2017.07.005
10.1109/TSP.2013.2241057
10.1016/j.comnet.2019.05.013
10.1177/0278364913495721
10.1007/978-3-031-22953-4_4
10.1155/2021/5300189
10.1109/9.580874
10.1016/j.neucom.2018.11.072
10.1023/A:1022672621406
10.1016/j.engappai.2022.105116
10.1016/j.jmsy.2022.05.018
10.1145/1390156.1390240
10.1109/TIE.2021.3104596
10.1016/j.ins.2022.04.053
10.1109/ACCESS.2020.3045027
10.1007/978-3-319-71682-4_5
10.1023/A:1018056104778
10.1109/ACCESS.2021.3074221
10.1038/nature24270
10.1109/TITS.2020.3033577
10.1109/VTC2021-Spring51267.2021.9448710
10.1016/j.eswa.2021.115127
10.1037/0033-295X.88.2.135
10.5220/0009821603140323
10.1016/j.eswa.2021.114663
10.1609/aaai.v34i04.6049
10.1613/jair.639
10.1016/j.jobe.2022.104165
10.26599/TST.2021.9010012
10.1109/JIOT.2021.3062091
10.1109/JIOT.2020.3015204
10.1609/aaai.v32i1.11794
10.1177/0278364918784350
10.1016/j.ins.2022.11.073
10.1016/j.eswa.2016.06.021
10.1016/j.engappai.2019.103360
10.1109/TMECH.2021.3072675
10.1613/jair.859
10.1613/jair.3912
10.1109/TASLP.2019.2919872
10.1109/TCYB.2020.2977374
10.1145/3453160
10.1109/JIOT.2020.3046622
10.1109/LRA.2021.3071954
10.1016/j.comnet.2020.107575
10.1155/2022/7839840
10.1109/ACCESS.2020.3023394
10.1609/aaai.v24i1.7727
10.1023/A:1018012322525
10.1109/IROS.2007.4399095
10.1109/JAS.2018.7511186
10.1038/nature14236
10.1016/j.conengprac.2020.104630
10.1016/j.engappai.2018.11.006
10.1038/s41586-020-03051-4
10.1016/j.neucom.2020.01.043
10.1145/3543846
10.1561/9781638280576
10.1016/j.engappai.2022.104848
10.1109/TAC.1970.1099405
10.1109/LRA.2022.3176112
10.1007/s10846-017-0468-y
10.1007/s11042-022-12572-1
10.1007/s10462-021-10061-9
10.1109/TSG.2016.2629450
10.1109/TVT.2022.3145346
10.1109/TNNLS.2021.3110281
10.1016/S0019-9958(77)90354-0
10.1109/TMECH.2021.3077388
10.1016/j.neunet.2008.02.003
10.1109/TITS.2020.3024655
10.1016/j.arcontrol.2020.06.001
10.1016/j.compchemeng.2019.05.029
10.1145/203330.203343
10.1109/TSUSC.2017.2743704
10.3390/s22062099
10.1109/JLT.2021.3125974
10.1016/j.cirpj.2022.11.003
10.1109/TSG.2022.3154718
10.1016/j.ins.2020.05.066
10.1109/ACCESS.2020.3038735
10.1023/A:1022628806385
10.1609/aaai.v26i1.8321
10.1109/ICRA.2018.8463189
10.1609/aaai.v30i1.10295
10.1016/j.engappai.2020.103869
10.1109/JAS.2020.1003072
10.1038/s41586-021-03819-2
10.1109/TITS.2021.3054625
10.1109/COMST.2019.2916583
10.1145/1772690.1772758
10.1109/THMS.2019.2912447
10.1162/neco.1997.9.8.1735
10.1016/j.automatica.2009.07.008
10.1126/science.abq1158
10.1109/TNN.2008.2005605
10.1038/nature14539
10.1023/A:1006559212014
10.1023/A:1022632907294
10.1016/j.compchemeng.2020.106886
10.1088/1367-2630/ab783d
10.1109/TCIAIG.2013.2294713
10.1145/1143844.1143955
10.1080/14786445008521796
10.1016/j.eswa.2022.116830
10.1016/j.eswa.2012.09.010
10.1038/nature16961
10.1145/1045236.1045237
10.1016/j.neunet.2021.10.003
10.1109/TAC.1965.1098193
10.1016/S0004-3702(99)00052-1
10.1147/rd.33.0210
10.1007/11564096_32
10.1038/nature25978
10.1016/j.engappai.2021.104451
10.1109/TIP.2022.3143699
10.1007/s10462-021-09997-9
10.1016/j.arcontrol.2018.09.005
10.1016/j.engappai.2021.104366
10.1049/el.2019.0244
10.1038/s41586-019-1724-z
10.1109/TII.2021.3093905
10.1145/1102351.1102377
10.1109/MSP.2017.2743240
10.1016/j.ins.2022.04.017
10.1109/ACCESS.2017.2777827
10.1016/j.robot.2008.10.024
10.1007/978-3-319-24574-4_28
10.1109/LRA.2020.2966414
10.1016/j.neucom.2007.11.026
10.1007/s10994-010-5229-0
10.1109/JPROC.2011.2109671
10.1016/S0019-9958(58)80003-0
10.1109/78.650093
10.3390/s21041067
10.1007/978-3-642-33492-4_6
10.1023/A:1017928328829
10.1016/j.eswa.2022.118926
10.1023/A:1022635613229
10.1007/11552246_35
10.1007/s10462-021-09996-w
10.7551/mitpress/9816.003.0050
10.1109/ACCESS.2021.3080617
10.1162/neco.2006.18.12.2936
10.1109/CDC.1998.760738
10.1109/ACCESS.2020.3034141
10.1109/TITS.2020.3003163
10.1021/acs.jcim.9b00325
10.1613/jair.806
10.1016/j.cirp.2021.04.056
10.1137/S0363012901385691
10.1109/TNNLS.2018.2790388
10.1109/ITSC.2011.6083114
10.1109/ROBOT.2004.1307456
10.1109/TSMCC.2007.913919
10.1109/LCOMM.2020.3025298
10.1109/CEC45853.2021.9504972
10.1126/science.aar6404
10.1007/s005210050038
ContentType Journal Article
Copyright 2023 Elsevier Ltd
Copyright_xml – notice: 2023 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.eswa.2023.120495
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
ExternalDocumentID 10_1016_j_eswa_2023_120495
S0957417423009971
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
9DU
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABKBG
ABUFD
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
WUQ
XPP
ZMT
~HD
ID FETCH-LOGICAL-c300t-e9d878f5a3329d1665004e1e10982c9933c3d19fbced7374f01255840945e6f23
ISICitedReferencesCount 271
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001046203600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0957-4174
IngestDate Sat Nov 29 07:02:47 EST 2025
Tue Nov 18 21:00:47 EST 2025
Fri Feb 23 02:35:09 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Deep Reinforcement Learning (DRL)
Function approximation
Stochastic optimal control
Reinforcement learning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c300t-e9d878f5a3329d1665004e1e10982c9933c3d19fbced7374f01255840945e6f23
ORCID 0000-0001-7213-6693
ParticipantIDs crossref_primary_10_1016_j_eswa_2023_120495
crossref_citationtrail_10_1016_j_eswa_2023_120495
elsevier_sciencedirect_doi_10_1016_j_eswa_2023_120495
PublicationCentury 2000
PublicationDate 2023-11-30
PublicationDateYYYYMMDD 2023-11-30
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-30
  day: 30
PublicationDecade 2020
PublicationTitle Expert systems with applications
PublicationYear 2023
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Sutton, McAllester, Singh, Mansour (b1550) 2000; 12
2021.
Stockfish: Strong open source chess engine. (2022). Retrieved from https://stockfishchess.org/. Accessed March 10, 2023.
Kalyanakrishnan, Stone (b0685) 2007
Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In
(pp. 501–510).
Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G., Reichert, D., Rabinowitz, N., Barreto, A., & Degris, T. (2017b). The predictron: End-to-end learning and planning. In
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2020). Multi-agent actor-critic for mixed cooperative-competitive environments. In
Mingshuo, N., Dongming, C., & Dongqi, W. (2022). Reinforcement learning on graph: A survey. arXiv preprint arXiv:2204.06127v3.
Bakhtin, A., Wu, D. J., Lerer, A., Gray, J., Jacob, A. P., Farina, G., Miller, A. H., & Brown, N. (2022). Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning. arXiv preprint arXiv:2210.05492v1.
Kirsch, L., Steenkiste, S. Van, & Schmidhuber, J. (2020). Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098.
Radford, Wu, Child, Luan, Amodei, Sutskever (b1235) 2019
Moody, Saffell (b1050) 2001; 12
Verma, Murali, Singh, Kohli, Chandhuri (b1620) 2018
Gharagozlou, Mohammadzadeh, Bastanfard, Ghidary (b0465) 2022
(pp. 222–229).
Luo, J., Li, C., Fan, Q., & Liu, Y. (2022b). A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347.
Maes, F., Fonteneau, R., Wehenkel, L., & Ernst, D. (2012). Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, JG., Lenca, P., Petit, JM. (eds)
Fujimoto, S., Meger, D., & Precup, D. (2019). Off-policy deep reinforcement learning without exploration. In
,
.
Sun, Lan, Li, Guo, Hu, Hu (b1505) 2020; 183
Abdoos, M., Mozayani, N., & Bazzan, A. L. C. (2011). Traffic light control in non-stationary environments based on multi agent Q-learning. In
Peters, J., & Schaal, S. (2007). Applying the episodic natural actor-critic architecture to motor primitive learning. In
Mazyavkina, Sviridov, Ivanov, Burnaev (b0990) 2021; 134
Yu, Zhang, Jiang, Yang, Shang (b1765) 2021; 173
Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P. P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H, P. d. O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., & Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. arXiv preprint arXiv: 1708.02596v2.
Gao, Y., Xu, H., Lin, Ji., Yu, F., Levine, S., & Darrell, T. (2018). Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv: 1802.05313.
Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, D., & Blundell, C. (2020a). Agent57: Outperforming the atari human benchmark. arXiv preprint arXiv: 2003.13350v1.
Schulman (b1355) 2016
Lecun, Bengio, Hinton (b0825) 2015; 521
Nguyen, Nguyen, Nahavandi (b1110) 2017; 5
Johnson, Hofmann, Hutton, Bignell (b0665) 2016
Silver, Hubert, Schrittwieser, Antonoglou, Lai, Guez, Hassabis (b1410) 2018; 362
Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-critic algorithms. In
Sutton, Singh (b1555) 1999; 112
(pp. 2672–2680).
Liu, Tian, Ai, Wang (b0925) 2020; 7
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Polosukhin (b1615) 2017
Chow, Ghavamzadeh, Janson, Pavone (b0270) 2017; 18
Hua, Li, Zhao, Zhang, Chen (b0615) 2020
Maei, H. R., Szepesvari, C., Bhatnagar, S., Precup, D., Silver, D., & Sutton, R. S. (2009). Convergent temporal-difference learning with arbitrary smooth function approximation. In
Lin (b0895) 1992; 8
Rashid, T., Samvelyan, M., Witt, C. S. de, Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In
Ståhl, Falkman, Karlsson, Mathiason, Boström (b1475) 2019; 59
Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games. arXiv preprint arXiv: 1703.10069.
Konda, Tsitsiklis (b0770) 2003; 42
Singh, Man, Kearns, Walker (b1450) 2002; 16
(pp. 443–451).
(pp. 1889–1897).
Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In
Salakhutdinov, R., & Hinton, G. (2009). Deep Boltzmann Machines. In
Schmitt, S., Hessel, M., & Simonyan, K. (2019). Off-policy actor-critic with shared experience replay. arXiv preprint arXiv:1909.11583.
(pp. 202–211).
Anderson, R. N., Boulanger, A., Powell, W. B., & Scott, W. (2011). Adaptive stochastic control for the smart grid. In
S., & McFall, J. (2013). Concurrent reinforcement learning from customer interactions. In
(pp. 1048–1056).
Moerland, T. M., Broekens, J., Plaat, A., & Jonker., C. M. (2022). Model-based reinforcement learning: A Survey. arXiv preprint arXiv: 2006.16712v4.
(pp. 2587–2601).
(ICRA 2004) (pp. 2619–2624).
Sun, P., Zhou, W., & Li, H. (2020b). Attentive experience replay. In
(pp. 10199–10210).
(pp. 2961–2970).
Kidambi, R., Rajeswaran, A., Netrapalli, P., & Joachims, T. (2020). MOReL: Model-based offline reinforcement learning. In
Wang, Wang, Sun (b1675) 2022; 602
(pp. 1008–1014).
(pp. 401–408).
Chen, Chen, Tan, Long, Gasic, Yu (b0250) 2019; 27
(pp. 1–13).
Matignon, L., Laurent, G. J., & Fort-piat, N. Le. (2007). Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In
L., Barker
(pp. 263-272).
Thanh, An, Chien (b1585) 2008
Kyaw, Paing, Thu, Mohan, Le, Veerajagadheswar (b0795) 2020; 8
Aradi (b0060) 2022; 23
Jaques, N., Gu, S., Bahdanau, D., Herńandez-Lobato, J. M., Turner, R. E., and Eck, D. (2017). Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. In
Ding, Xu, Gao, Shen (b0330) 2022; 9
Glanois, C., Weng, P., Zimmer, M., Li, D., Yang, T., Hao, J., & Liu, W. (2022). A survey on interpretable reinforcement learning. arXiv preprint arXiv: 2112.13112v2.
Yin, Chen, Liu, Huang, Gao (b1750) 2021; 106
(pp. 4344–4353).
Bi, Jiang, Gao, Wendler, Karlas, Navab (b0195) 2022; 7
(pp. 6118-6128).
(pp. 2252–2260).
Argall, Chernova, Veloso, Browning (b0065) 2009; 57
Lee, A. X., Nagabandi, A., Abbeel, P., & Levine, S. (2020). Stochastic latent actor-critic : Deep reinforcement learning with a latent variable model. In
Feinberg, V., Wan, A., Stoica, I., Jordan, M. I., Gonzalez, J. E., & Levine, S. (2018). Model-based value expansion for efficient model-free reinforcement learning. arXiv preprint arXiv: 1803.00101v1.
(pp. 4754-4765).
Khan, Gazara, Nofal, Chakrabarty, Dannoun, AL-Hmouz, Mursaleen (b0705) 2021; 9
Nadjahi, K., Laroche, R., & Combes, R. T. (2019). Safe policy improvement with soft baseline bootstrapping. arXiv preprint arXiv: 1907.05079v1.
Szita, Lorincz (b1565) 2006; 18
Gronauer, Diepold (b0480) 2022; 55
Schrittwieser, Antonoglou, Hubert, Simonyan, Sifre, Schmitt, Silver (b1350) 2020; 588
Na, Niu, Lennox, Arvin (b1065) 2022; 71
Fox, R., Pakman, A., & Tishby, N. (2016). Taming the noise in reinforcement learning via soft updates. In
(pp. 1–12).
Scheikl, P. M., Gyenes, B., Davitashvili, T., Younis, R., Schulze, A., Muller-Stich, B. P., Neumann. G., & Mathis-Ullrich, F. (2021). Cooperative assistance in robotic surgery through multi-agent reinforcement learning. In
(pp. 4295–4304).
Campbell, Hoane, Hsu (b0230) 2002; 134
Noaeen, Naik, Goodman, Crebo, Abrar, Abad, Far (b1125) 2022; 199
Silver, D., Newnham
Van Seijen, H., & Sutton, R. S. (2014). True online TD(λ). In
Radoglou-Grammatikis, Rompolos, Sarigiannidis, Argyriou, Lagkas, Sarigiannidis, Wan (b1240) 2022; 18
Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In
Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In
Zhou, Le, Luu, Nguyen, Ayache (b1815) 2021; 73
Hausknecht, Lehman, Miikkulainen, Stone (b0555) 2014; 6
Engel, Y., Mannor, S., & Ron, M. (2005). Reinforcement learning with Gaussian processes. In
(pp. 201–208).
(pp. 881–888).
Bellemare, M. G., Veness, J., & Bowling, M. (2012). Investigating Contingency Awareness Using Atari 2600 Games. In
Ernst, Geurts, Wehenkel (b0360) 2005; 6
(pp. 1146-1155).
(pp. 314–323).
Deisenroth, M. P., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In
Silver, Huang, Maddison, Guez, Sifre, van den Driessche, Hassabis (b1405) 2016; 529
(pp. 1098-1115).
(pp. 1691-1696).
Jaques, Lazaridou, Hughes, Gulcehre, Ortega, Strouse, Freitas (b0655) 2019
Li, Gomez, Nakamura, He (b0865) 2019; 49
Brockman, Cheung, Pettersson, Schneider, Schulman, Tang, Zaremba (b0210) 2016
Klopf (b0740) 1975
Peters, Schaal (b1205) 2008; 21
Willia (b1705) 1992; 8
Farahmand, A. M., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. (2008). Regularized policy iteration. In
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot. M., Sonnerat. N., Leibo. J. Z., Tuyls. K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning. In
(pp. 441–448).
Nguyen, Nguyen, Nahavandi (b1115) 2020; 50
Buşoniu, de Bruin, Tolić, Kober, Palunko (b0220) 2018; 46
Jaksch, Ortner, Auer (b0645) 2010; 11
Ding, Lin, Shi, Yan (b0325) 2022
Iqbal, S., & Sha, F. (2019). Actor-attention-critic for multi-agent reinforcement learning. In
Chaffre, T., Moras, J., Chan-Hon-T
10.1016/j.eswa.2023.120495_b1365
10.1016/j.eswa.2023.120495_b0275
Li (10.1016/j.eswa.2023.120495_b0880) 2022; 40
10.1016/j.eswa.2023.120495_b1485
10.1016/j.eswa.2023.120495_b0395
10.1016/j.eswa.2023.120495_b0030
Liu (10.1016/j.eswa.2023.120495_b0905) 2023; 213
Afsar (10.1016/j.eswa.2023.120495_b0020) 2022; 55
10.1016/j.eswa.2023.120495_b0150
10.1016/j.eswa.2023.120495_b1360
10.1016/j.eswa.2023.120495_b1480
Subramanian (10.1016/j.eswa.2023.120495_b1490) 2022; 145
Zhang (10.1016/j.eswa.2023.120495_b1785) 2019; 4
Arwa (10.1016/j.eswa.2023.120495_b0075) 2020; 8
Bellman (10.1016/j.eswa.2023.120495_b0170) 1972
Bellman (10.1016/j.eswa.2023.120495_b0165) 1958; 1
Jumper (10.1016/j.eswa.2023.120495_b0670) 2021; 596
Vinyals (10.1016/j.eswa.2023.120495_b1625) 2019; 575
Baxter (10.1016/j.eswa.2023.120495_b0130) 2001; 15
Zhu (10.1016/j.eswa.2023.120495_b1825) 2021; 26
He (10.1016/j.eswa.2023.120495_b0565) 2016
Liu (10.1016/j.eswa.2023.120495_b0925) 2020; 7
10.1016/j.eswa.2023.120495_b0945
Zhang (10.1016/j.eswa.2023.120495_b1780) 2022; 191
10.1016/j.eswa.2023.120495_b0940
Zhou (10.1016/j.eswa.2023.120495_b1815) 2021; 73
10.1016/j.eswa.2023.120495_b0025
Michie (10.1016/j.eswa.2023.120495_b1010) 1968; 2
Radford (10.1016/j.eswa.2023.120495_b1235) 2019
10.1016/j.eswa.2023.120495_b0385
10.1016/j.eswa.2023.120495_b0140
Konda (10.1016/j.eswa.2023.120495_b0770) 2003; 42
10.1016/j.eswa.2023.120495_b0260
10.1016/j.eswa.2023.120495_b1470
Sutton (10.1016/j.eswa.2023.120495_b1525) 1990
Huang (10.1016/j.eswa.2023.120495_b0620) 2022; 64
Kar (10.1016/j.eswa.2023.120495_b0695) 2013; 61
Khan (10.1016/j.eswa.2023.120495_b0705) 2021; 9
Schuster (10.1016/j.eswa.2023.120495_b1375) 1997; 45
Haykin (10.1016/j.eswa.2023.120495_b0560) 2008
10.1016/j.eswa.2023.120495_b0810
10.1016/j.eswa.2023.120495_b0930
10.1016/j.eswa.2023.120495_b1225
10.1016/j.eswa.2023.120495_b0135
Ormoneit (10.1016/j.eswa.2023.120495_b1145) 2002; 49
10.1016/j.eswa.2023.120495_b1345
Zeng (10.1016/j.eswa.2023.120495_b1775) 2022; 468
Zhou (10.1016/j.eswa.2023.120495_b1810) 2019; 331
10.1016/j.eswa.2023.120495_b0495
Zhang (10.1016/j.eswa.2023.120495_b1795) 2018
10.1016/j.eswa.2023.120495_b0010
10.1016/j.eswa.2023.120495_b1340
Bertsekas (10.1016/j.eswa.2023.120495_b0180) 2005; vol. 1
10.1016/j.eswa.2023.120495_b0370
10.1016/j.eswa.2023.120495_b0490
Khayyat (10.1016/j.eswa.2023.120495_b0715) 2022; 81
Hu (10.1016/j.eswa.2023.120495_b0610) 2021; 9
Krishnan (10.1016/j.eswa.2023.120495_b0775) 2019; 38
Parisotto (10.1016/j.eswa.2023.120495_b1170) 2016
Moody (10.1016/j.eswa.2023.120495_b1050) 2001; 12
García (10.1016/j.eswa.2023.120495_b0460) 2020; 88
10.1016/j.eswa.2023.120495_b0920
Zhao (10.1016/j.eswa.2023.120495_b1805) 2020
10.1016/j.eswa.2023.120495_b0005
10.1016/j.eswa.2023.120495_b1335
10.1016/j.eswa.2023.120495_b0245
10.1016/j.eswa.2023.120495_b0365
10.1016/j.eswa.2023.120495_b1575
10.1016/j.eswa.2023.120495_b0485
HasanzadeZonuzy (10.1016/j.eswa.2023.120495_b0540) 2021
Peters (10.1016/j.eswa.2023.120495_b1205) 2008; 21
10.1016/j.eswa.2023.120495_b1695
10.1016/j.eswa.2023.120495_b0120
10.1016/j.eswa.2023.120495_b1330
10.1016/j.eswa.2023.120495_b0240
Fang (10.1016/j.eswa.2023.120495_b0380) 2021; 8
Xu (10.1016/j.eswa.2023.120495_b1740) 2014; 261
10.1016/j.eswa.2023.120495_b1690
Chow (10.1016/j.eswa.2023.120495_b0270) 2017; 18
Klopf (10.1016/j.eswa.2023.120495_b0745) 1982
Sutton (10.1016/j.eswa.2023.120495_b1540) 1998
Mendonca (10.1016/j.eswa.2023.120495_b1005) 2019
Mahmud (10.1016/j.eswa.2023.120495_b0965) 2018; 29
Segler (10.1016/j.eswa.2023.120495_b1380) 2018; 555
Bi (10.1016/j.eswa.2023.120495_b0195) 2022; 7
Apuroop (10.1016/j.eswa.2023.120495_b0055) 2021; 21
10.1016/j.eswa.2023.120495_b0910
Sutton (10.1016/j.eswa.2023.120495_b1530) 1981; 4
Tesauro (10.1016/j.eswa.2023.120495_b1580) 1995; 38
10.1016/j.eswa.2023.120495_b1325
10.1016/j.eswa.2023.120495_b0355
Wang (10.1016/j.eswa.2023.120495_b1660) 2023; 619
Schulman (10.1016/j.eswa.2023.120495_b1355) 2016
10.1016/j.eswa.2023.120495_b0870
10.1016/j.eswa.2023.120495_b1045
Polydoros (10.1016/j.eswa.2023.120495_b1210) 2017; 86
10.1016/j.eswa.2023.120495_b1285
Bhatnagar (10.1016/j.eswa.2023.120495_b0190) 2009; 45
Watter (10.1016/j.eswa.2023.120495_b1685) 2015
Bellman (10.1016/j.eswa.2023.120495_b0160) 1957; 6
Du (10.1016/j.eswa.2023.120495_b0335) 2020; 54
Mendel (10.1016/j.eswa.2023.120495_b1000) 1966; 5
10.1016/j.eswa.2023.120495_b1280
Morais (10.1016/j.eswa.2023.120495_b0305) 2020; 104
Soleymani (10.1016/j.eswa.2023.120495_b1460) 2021; 182
Van Seijen (10.1016/j.eswa.2023.120495_b1610) 2009
Gharagozlou (10.1016/j.eswa.2023.120495_b0465) 2022
Campbell (10.1016/j.eswa.2023.120495_b0230) 2002; 134
Mnih (10.1016/j.eswa.2023.120495_b1040) 2015; 518
Kobayashi (10.1016/j.eswa.2023.120495_b0750) 2020; 95
Wu (10.1016/j.eswa.2023.120495_b1720) 2017
Argall (10.1016/j.eswa.2023.120495_b0065) 2009; 57
10.1016/j.eswa.2023.120495_b0500
10.1016/j.eswa.2023.120495_b0860
Yu (10.1016/j.eswa.2023.120495_b1755) 2018
10.1016/j.eswa.2023.120495_b0980
10.1016/j.eswa.2023.120495_b1035
10.1016/j.eswa.2023.120495_b1155
Fu (10.1016/j.eswa.2023.120495_b0435) 2022; 50
10.1016/j.eswa.2023.120495_b1030
10.1016/j.eswa.2023.120495_b1150
10.1016/j.eswa.2023.120495_b1270
10.1016/j.eswa.2023.120495_b1390
Silver (10.1016/j.eswa.2023.120495_b1425) 2017; 550
Brockman (10.1016/j.eswa.2023.120495_b0210) 2016
Verma (10.1016/j.eswa.2023.120495_b1620) 2018
Wang (10.1016/j.eswa.2023.120495_b1670) 2016
Pateria (10.1016/j.eswa.2023.120495_b1175) 2021; 54
Rajak (10.1016/j.eswa.2023.120495_b1245) 2021; 7
Haarnoja (10.1016/j.eswa.2023.120495_b0505) 2018
Vaswani (10.1016/j.eswa.2023.120495_b1615) 2017
Fan (10.1016/j.eswa.2023.120495_b0375) 2022; 22
10.1016/j.eswa.2023.120495_b0735
Hausknecht (10.1016/j.eswa.2023.120495_b0555) 2014; 6
Werbos (10.1016/j.eswa.2023.120495_b1700) 1977; 1977
10.1016/j.eswa.2023.120495_b0975
Johnson (10.1016/j.eswa.2023.120495_b0665) 2016
10.1016/j.eswa.2023.120495_b0730
10.1016/j.eswa.2023.120495_b0850
Li (10.1016/j.eswa.2023.120495_b0855) 2022; 378
Omidshafiei (10.1016/j.eswa.2023.120495_b1140) 2017
10.1016/j.eswa.2023.120495_b0970
10.1016/j.eswa.2023.120495_b1265
Vo (10.1016/j.eswa.2023.120495_b1635) 2022; 26
10.1016/j.eswa.2023.120495_b0175
Buşoniu (10.1016/j.eswa.2023.120495_b0220) 2018; 46
Yin (10.1016/j.eswa.2023.120495_b1750) 2021; 106
Miljković (10.1016/j.eswa.2023.120495_b1015) 2013; 40
10.1016/j.eswa.2023.120495_b1020
Oh (10.1016/j.eswa.2023.120495_b1130) 2015
Zhang (10.1016/j.eswa.2023.120495_b1790) 2021; 8
10.1016/j.eswa.2023.120495_b0050
10.1016/j.eswa.2023.120495_b1260
Sutton (10.1016/j.eswa.2023.120495_b1555) 1999; 112
10.1016/j.eswa.2023.120495_b0290
Tsitsiklis (10.1016/j.eswa.2023.120495_b1595) 1997; 42
Claessens (10.1016/j.eswa.2023.120495_b0285) 2018; 9
Radoglou-Grammatikis (10.1016/j.eswa.2023.120495_b1240) 2022; 18
10.1016/j.eswa.2023.120495_b0845
Ding (10.1016/j.eswa.2023.120495_b0330) 2022; 9
10.1016/j.eswa.2023.120495_b0720
10.1016/j.eswa.2023.120495_b0840
10.1016/j.eswa.2023.120495_b0960
Ng (10.1016/j.eswa.2023.120495_b1105) 2000
10.1016/j.eswa.2023.120495_b1135
10.1016/j.eswa.2023.120495_b1495
10.1016/j.eswa.2023.120495_b1370
10.1016/j.eswa.2023.120495_b0280
Ernst (10.1016/j.eswa.2023.120495_b0360) 2005; 6
Harney (10.1016/j.eswa.2023.120495_b0535) 2020; 22
Lu (10.1016/j.eswa.2023.120495_b0935) 2022; 69
Thanh (10.1016/j.eswa.2023.120495_b1585) 2008
10.1016/j.eswa.2023.120495_b0835
10.1016/j.eswa.2023.120495_b0955
10.1016/j.eswa.2023.120495_b0830
Munos (10.1016/j.eswa.2023.120495_b1060) 2016
Pan (10.1016/j.eswa.2023.120495_b1160) 2018
Thorndike (10.1016/j.eswa.2023.120495_b1590) 1911
10.1016/j.eswa.2023.120495_b0310
10.1016/j.eswa.2023.120495_b1640
10.1016/j.eswa.2023.120495_b0550
Lagoudakis (10.1016/j.eswa.2023.120495_b0805) 2003; 4
10.1016/j.eswa.2023.120495_b1760
Singh (10.1016/j.eswa.2023.120495_b1455) 1996; 22
10.1016/j.eswa.2023.120495_b0790
Chen (10.1016/j.eswa.2023.120495_b0265) 2022; 13
Barto (10.1016/j.eswa.2023.120495_b0125) 2003; 13
10.1016/j.eswa.2023.120495_b1085
Hein (10.1016/j.eswa.2023.120495_b0570) 2017; 65
10.1016/j.eswa.2023.120495_b1080
Pomerleau (10.1016/j.eswa.2023.120495_b1215) 1989
Azar (10.1016/j.eswa.2023.120495_b0090) 2020; 50
Li (10.1016/j.eswa.2023.120495_b0875) 2023; 40
Gronauer (10.1016/j.eswa.2023.120495_b0480) 2022; 55
Wymann (10.1016/j.eswa.2023.120495_b1730) 2013; v1.3.5
10.1016/j.eswa.2023.120495_b1515
Wang (10.1016/j.eswa.2023.120495_b1675) 2022; 602
10.1016/j.eswa.2023.120495_b0425
Nguyen (10.1016/j.eswa.2023.120495_b1115) 2020; 50
10.1016/j.eswa.2023.120495_b0545
10.1016/j.eswa.2023.120495_b0785
10.1016/j.eswa.2023.120495_b1510
10.1016/j.eswa.2023.120495_b0420
Ioffe (10.1016/j.eswa.2023.120495_b0630) 2015
10.1016/j.eswa.2023.120495_b1630
Samsani (10.1016/j.eswa.2023.120495_b1310) 2021; 6
10.1016/j.eswa.2023.120495_b0660
Lazaric (10.1016/j.eswa.2023.120495_b0815) 2012; 13
10.1016/j.eswa.2023.120495_b0780
Rakelly (10.1016/j.eswa.2023.120495_b1250) 2019; 97
Henderson (10.1016/j.eswa.2023.120495_b0575) 2018
Jaksch (10.1016/j.eswa.2023.120495_b0645) 2010; 11
10.1016/j.eswa.2023.120495_b1195
Bellman (10.1016/j.eswa.2023.120495_b0155) 1956; 16
Sun (10.1016/j.eswa.2023.120495_b1500) 2021; 25
Kyaw (10.1016/j.eswa.2023.120495_b0795) 2020; 8
10.1016/j.eswa.2023.120495_b1070
Lecun (10.1016/j.eswa.2023.120495_b0825) 2015; 521
10.1016/j.eswa.2023.120495_b1190
Zhao (10.1016/j.eswa.2023.120495_b1800) 2021; 22
Amini (10.1016/j.eswa.2023.120495_b0040) 2020; 5
10.1016/j.eswa.2023.120495_b0415
Amini (10.1016/j.eswa.2023.120495_b0045) 2020; 6
Riedmiller (10.1016/j.eswa.2023.120495_b1275) 1999; 8
10.1016/j.eswa.2023.120495_b0410
10.1016/j.eswa.2023.120495_b0650
Fu (10.1016/j.eswa.2023.120495_b0430) 1970; 15
10.1016/j.eswa.2023.120495_b0890
10.1016/j.eswa.2023.120495_b1185
10.1016/j.eswa.2023.120495_b0095
Fawzi (10.1016/j.eswa.2023.120495_b0390) 2022; 610
Cao (10.1016/j.eswa.2023.120495_b0235) 2022; 27
Pong (10.1016/j.eswa.2023.120495_b1220) 2018
Dietterich (10.1016/j.eswa.2023.120495_b0320) 2000; 13
Silver (10.1016/j.eswa.2023.120495_b1405) 2016; 529
Hester (10.1016/j.eswa.2023.120495_b0585) 2018
Banerjee (10.1016/j.eswa.2023.120495_b0115) 2021; 67
10.1016/j.eswa.2023.120495_b0405
10.1016/j.eswa.2023.120495_b0525
10.1016/j.eswa.2023.120495_b0765
10.1016/j
References_xml – reference: Kumar, A., Zhou, A., Tucker, G., & Levine, S. (2020). Conservative Q-learning for offline reinforcement learning. In
– reference: Strehl, A. L., Lihong, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). PAC model-free reinforcement learning. In
– reference: (pp. 1-9).
– reference: Jaques, N., Gu, S., Bahdanau, D., Herńandez-Lobato, J. M., Turner, R. E., and Eck, D. (2017). Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. In
– year: 2008
  ident: b0560
  article-title: Neural Networks and Learning Machines
– reference: Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397.
– volume: 97
  start-page: 5331
  year: 2019
  end-page: 5340
  ident: b1250
  publication-title: Efficient off-policy meta-reinforcement learning via probabilistic context variables
– reference: Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In
– volume: vol. 1
  year: 2005
  ident: b0180
  publication-title: Dynamic programming and optimal control
– volume: 13
  start-page: 41
  year: 2003
  end-page: 77
  ident: b0125
  article-title: Recent advances in hierarchical reinforcement learning
  publication-title: Discrete Event Dynamic Systems: Theory and Applications
– reference: (pp. 361–368).
– reference: Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Lefrancq, L., & Legg, S. (2017). AI safety gridworlds. arXiv preprint arXiv:1711.09883.
– volume: 5
  start-page: 27091
  year: 2017
  end-page: 27102
  ident: b1110
  article-title: System design perspective for human-level agents using deep reinforcement learning: A survey
  publication-title: IEEE Access
– reference: Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In
– volume: 182
  start-page: 115127
  year: 2021
  ident: b1460
  article-title: Deep graph convolutional reinforcement learning for financial portfolio management – DeepPocket
  publication-title: Expert Systems With Applications
– reference: Oh, J., Singh, S., & Lee, H. (2017). Value prediction network
– volume: 38
  start-page: 156
  year: 2008
  end-page: 172
  ident: b0215
  article-title: A comprehensive survey of multiagent reinforcement learning
  publication-title: IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews
– volume: 9
  start-page: 67259
  year: 2021
  end-page: 67267
  ident: b0610
  article-title: Reward shaping based federated reinforcement learning
  publication-title: IEEE Access
– reference: In
– reference: (pp. 74–98).
– reference: Ciosek, K., Vuong, Q., Loftin, R., & Hofmann, K. (2019). Better exploration with optimistic actor-critic. In
– reference: Castro, P. S., Moitra, S., Gelada, C., Kumar, S., & Bellemare, M. G. (2018). Dopamine: A research framework for deep reinforcement learning. arXiv preprint arXiv: 1812.06110.
– start-page: 7667
  year: 2021
  end-page: 7674
  ident: b0540
  article-title: Learning with safety constraints: Sample complexity of reinforcement learning for constrained MDPs
  publication-title: Proceedings of the 35th AAAI Conference on Artificial Intelligence
– reference: Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A. D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K., & Silver, D. (2015). Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296v2.
– reference: Levine, S., & Koltun, V. (2013). Guided policy search. In
– reference: Iqbal, S., & Sha, F. (2019). Actor-attention-critic for multi-agent reinforcement learning. In
– reference: Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In
– start-page: 1
  year: 2020
  end-page: 6
  ident: b1805
  article-title: State representation learning for effective deep reinforcement learning
  publication-title: IEEE International Conference on Multimedia and Expo. (ICME)
– volume: 6
  start-page: 679
  year: 1957
  end-page: 684
  ident: b0160
  article-title: A Markovian decision process
  publication-title: Journal of Mathematics and Mechanics
– start-page: 2746
  year: 2015
  end-page: 2754
  ident: b1685
  article-title: Embed to control: A locally linear latent dynamics model for control from raw images
  publication-title: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS)
– reference: Bellemare, M. G., Veness, J., & Bowling, M. (2012). Investigating Contingency Awareness Using Atari 2600 Games. In
– reference: (pp. 1071-1079).
– reference: Liu, S., Ngiam, K. Y., & Feng, M. (2019). Deep reinforcement learning for clinical decision support: A brief survey. arXiv preprint arXiv: 1907.09475.
– reference: Fakoor, R., Chaudhari, P., Soatto, S., & Smola, A. J. (2020). META-Q-Learning. arXiv preprint arXiv:1910.00125.
– volume: 22
  start-page: 4550
  year: 2021
  end-page: 4559
  ident: b1075
  article-title: A generative adversarial network enabled deep distributional reinforcement learning for transmission scheduling in internet of vehicles
  publication-title: IEEE Transactions on Intelligent Transportation Systems
– reference: Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. In
– volume: 22
  start-page: 7208
  year: 2021
  end-page: 7218
  ident: b1800
  article-title: A hybrid of deep reinforcement learning and local search for the vehicle routing problems
  publication-title: IEEE Transactions on Intelligent Transportation Systems
– year: 1927
  ident: b1180
  article-title: Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex
– year: 2018
  ident: b1545
  article-title: Reinforcement Learning An Introduction
– reference: Liu, J., & Feng, L. (2021). Diversity evolutionary policy deep reinforcement learning.
– volume: 550
  start-page: 354
  year: 2017
  end-page: 359
  ident: b1425
  article-title: Mastering the game of Go without human knowledge
  publication-title: Nature
– reference: Rudin, N., Hoeller, D., Reist, P., & Hutter, M. (2021). Learning to walk in minutes using massively parallel deep reinforcement learning. arXiv preprint arXiv:2109.11978.
– start-page: 5285
  year: 2017
  end-page: 5294
  ident: b1720
  article-title: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation
  publication-title: Proceddings of the 31st Conference on Neural Information Processing Systems
– volume: 8
  start-page: 341
  year: 1992
  end-page: 362
  ident: b0300
  article-title: The convergence of TD(λ) for general λ
  publication-title: Machine Learning
– reference: Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Pfaff , T., Weber, T., Buesing, L., & Battaglia, P. W. (2020). Combining Q-learning and search with amortized value estimates.
– volume: 27
  start-page: 1378
  year: 2019
  end-page: 1391
  ident: b0250
  article-title: AgentGraph: Toward universal dialogue management with structured deep reinforcement learning
  publication-title: IEEE/ACM Transactions on Audio Speech and Language Processing
– reference: (pp. 4344–4353).
– volume: v1.3.5
  start-page: 2013
  year: 2013
  ident: b1730
  publication-title: TORCS, The open racing car simulator
– volume: 86
  start-page: 153
  year: 2017
  end-page: 173
  ident: b1210
  article-title: Survey of model-based reinforcement learning: Applications on Robotics
  publication-title: Journal of Intelligent and Robotic Systems: Theory and Applications
– volume: 362
  start-page: 1140
  year: 2018
  end-page: 1144
  ident: b1410
  article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
  publication-title: Science
– volume: 3
  start-page: 72
  year: 1978
  end-page: 75
  ident: b1520
  article-title: Single channel theory: A neuronal theory of learning
  publication-title: Brain Theory Newsletter
– volume: 22
  start-page: 33
  year: 1996
  end-page: 57
  ident: b0205
  article-title: Linear least-squares algorithms for temporal difference learning
  publication-title: Machine Learning
– reference: (pp. 2145–2153).
– year: 2016
  ident: b1355
  article-title: Optimizing Expectations: From deep reinforcement learning to stochastic computation graphs
– volume: 55
  start-page: 895
  year: 2022
  end-page: 943
  ident: b0480
  article-title: Multi-agent deep reinforcement learning: A survey
  publication-title: Artificial Intelligence Review
– reference: Rawlik, K., Toussaint, M., & Vijayakumar, S. (2012). On stochastic optimal control and reinforcement learning by approximate inference. In
– reference: , 2021.
– reference: (pp. 4754-4765).
– volume: 88
  start-page: 103360
  year: 2020
  ident: b0460
  article-title: Teaching a humanoid robot to walk faster through safe reinforcement learning
  publication-title: Engineering Applications of Artificial Intelligence
– reference: Sun, P., Zhou, W., & Li, H. (2020b). Attentive experience replay. In
– start-page: 3040
  year: 2019
  end-page: 3049
  ident: b0655
  article-title: Social influence as intrinsic motivation for multi-agent deep reinforcement learning
  publication-title: Proceedings of the 36th International Conference on Machine Learning
– volume: 9
  start-page: 5785
  year: 2022
  end-page: 5798
  ident: b0330
  article-title: Trajectory design and access control for air – Ground coordinated communications system with multiagent deep reinforcement learning
  publication-title: IEEE Internet of Things Journal
– volume: 41
  start-page: 256
  year: 1950
  end-page: 275
  ident: b1385
  article-title: XXII. Programming a computer for playing chess
  publication-title: Philosophical Magazine and Journal of Science
– volume: 5
  start-page: 1143
  year: 2020
  end-page: 1150
  ident: b0040
  article-title: Learning robust control policies for end-to-end autonomous driving from data-driven simulation
  publication-title: IEEE Robotics and Automation Letters
– reference: Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games. arXiv preprint arXiv: 1703.10069.
– reference: Akkaya, I., Andrychowicz, M., Chociej, M., Litwin, M., McGrew, B., Petron, A., Paino, A., Plappert, M., Powell, G., Ribas, R., Schneider, J., Tezak, N., Tworek, J., Welinder, P., Weng, L., Yuan, Q., Zaremba, W., & Zhang, L. (2019). Solving Rubik’s cube with a robot hand. arXiv preprint arXiv: 1910.07113.
– volume: 42
  start-page: 1143
  year: 2003
  end-page: 1166
  ident: b0770
  article-title: On actor-critic algorithms
  publication-title: SIAM Journal on Control and Optimization
– start-page: 630
  year: 2016
  end-page: 645
  ident: b0565
  article-title: Identity mappings in deep residual networks
  publication-title: Proceedings of the European Conference on Computer Vision
– start-page: 3223
  year: 2018
  end-page: 3230
  ident: b0585
  article-title: Deep Q-learning from demonstrations
  publication-title: Proceedings of the 32nd AAAI Conference on Artificial Intelligence
– reference: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347.
– reference: (pp. 881–888).
– volume: 42
  start-page: 674
  year: 1997
  end-page: 690
  ident: b1595
  article-title: An analysis of temporal-difference learning with function approximation
  publication-title: IEEE Transactions on Automatic Control
– reference: Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In
– volume: 2
  start-page: 137
  year: 1968
  end-page: 152
  ident: b1010
  article-title: BOXES, An experiment in adaptive control
  publication-title: Machine Intelligence
– start-page: 1
  year: 2017
  end-page: 20
  ident: b1650
  article-title: Sample efficient actor-critic with experience replay
  publication-title: Proceedings of the 5th International Conference on Learning Representations (ICLR)
– reference: Levine, S., & Abbeel, P. (2014). Learning neural network policies with guided policy search under unknown dynamics. In
– reference: (pp. 1098-1115).
– reference: (pp. 5023–5033).
– reference: (2022)104848, 1–16.
– reference: Shannon, C. E. (1952). “Theseus” maze-solving mouse. Retrieved from http://cyberneticzoo.com/mazesolvers/1952-–-theseus-maze-solving-mouse-–-claude-shannon-american/. Accessed March 10, 2023.
– reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602.
– start-page: 1
  year: 2022
  end-page: 12
  ident: b0295
  article-title: Distributed actor-critic algorithms for multiagent reinforcement learning over directed graphs
  publication-title: IEEE Transactions On Neural Networks and Learning Systems
– start-page: 1
  year: 2020
  end-page: 6
  ident: b0615
  article-title: GAN-based deep distributional reinforcement learning for resource management in network slicing
  publication-title: Proceedings of the 2019 IEEE Global Communications Conference
– reference: Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., & Wierstra, D., (2016). Continuous control with deep reinforcement learning. In
– reference: (pp. 6383–6393).
– reference: Zanette, A. & Brunskill, E. (2019). Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds. In
– reference: Kaiser, Ł., Babaeizadeh, M., Miłos, P., Osinski, B., Campbell, R. H., Czechowski, K., Erhan. D., Finn. C., Kozakowski. P., Levine. S., Mohuuddin. A., Sepassi. R., Tucker. G., & Michalewski, H. (2020). Model based reinforcement learning for atari. arXiv preprint arXiv:1903.00374.
– reference: Silver, D., Newnham,
– volume: 183
  start-page: 107575
  year: 2020
  ident: b1505
  article-title: Efficient flow migration for NFV with Graph-aware deep reinforcement learning
  publication-title: Computer Networks
– volume: 47
  start-page: 253
  year: 2013
  end-page: 279
  ident: b0145
  article-title: The arcade learning environment: An evaluation platform for general agents
  publication-title: Journal of Artificial Intelligence Research
– reference: Weber, T., Racanière, S., Reichert, D. P., Buesing, L., Guez,A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D., & Wierstra, D. (2017). Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv: 1707.06203v2.
– reference: Deisenroth, M. P., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In
– volume: 139
  start-page: 1
  year: 2020
  end-page: 30
  ident: b1120
  article-title: A review On reinforcement learning: Introduction and applications in industrial process control
  publication-title: Computers and Chemical Engineering
– reference: Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., & Kavukcuoglu, K. (2018). IMPALA: Scalable distributed Deep-RL with importance weighted actor-learner architectures. In
– volume: 145
  start-page: 271
  year: 2022
  end-page: 287
  ident: b1490
  article-title: Reinforcement learning and its connections with neuroscience and psychology
  publication-title: Neural Networks
– start-page: 5872
  year: 2018
  end-page: 5881
  ident: b1795
  article-title: Fully decentralized multi-agent reinforcement learning with networked agents
  publication-title: Proceedings of the 35th International Conference on Machine Learning, PMLR 80
– volume: 13
  start-page: 2935
  year: 2022
  end-page: 2958
  ident: b0265
  article-title: Reinforcement learning for selective key applications in power systems: Recent advances and future challenges
  publication-title: IEEE Transactions On Smart Grid
– reference: Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Reinforcement learning with soft state aggregation.
– start-page: 3215
  year: 2018
  end-page: 3222
  ident: b0580
  article-title: Rainbow: Combining improvements in deep reinforcement learning
  publication-title: 32nd AAAI Conference on Artificial Intelligence (AAAI)
– year: 1998
  ident: b1540
  article-title: Introduction to Reinforcement Learning
– volume: 173
  start-page: 114663
  year: 2021
  ident: b1765
  article-title: Reinforcement learning approach for resource allocation in humanitarian logistics
  publication-title: Expert Systems With Applications
– reference: (pp. 2–7).
– volume: 18
  start-page: 6070
  year: 2017
  end-page: 6120
  ident: b0270
  article-title: Risk-constrained reinforcement learning with percentile risk criteria
  publication-title: The Journal of Machine Learning Research
– volume: 21
  start-page: 1
  year: 2021
  end-page: 20
  ident: b0055
  article-title: Reinforcement learning-based complete area coverage path planning for a modified htrihex robot
  publication-title: Sensors
– reference: (pp. 4295–4304).
– volume: 134
  start-page: 57
  year: 2002
  end-page: 83
  ident: b0230
  article-title: Deep blue
  publication-title: Artificial Intelligence
– start-page: 1
  year: 2016
  end-page: 16
  ident: b1170
  article-title: Actor-mimic deep multitask and transfer reinforcement learning
  publication-title: Proceedings of the 4th International Conference on Learning Representations (ICLR)
– volume: 3
  start-page: 210
  year: 1959
  end-page: 229
  ident: b1315
  article-title: Some studies in machine learning using the game of Chekers
  publication-title: IBM Journal of Research and Development
– reference: Yu, T., Thomas, G., Yu, L., Ermon, S., Zou, J., Levine, S., Finn, C., & Ma, T. (2020). MOPO: Model-based offline policy optimization. In
– reference: Li, Y. (2018). Deep reinforcement learning. arXiv preprint arXiv:1810.06339v1.
– reference: (pp. 2094–2100).
– volume: 20
  start-page: 61
  year: 2009
  end-page: 80
  ident: b1320
  article-title: The graph neural network model
  publication-title: IEEE Transactions on Neural Networks
– volume: 55
  start-page: 945
  year: 2022
  end-page: 990
  ident: b1445
  article-title: Reinforcement learning in robotic applications: A comprehensive survey
  publication-title: Artificial Intelligence Review
– reference: Fujimoto, S., Van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. In
– reference: (pp.1097–1105).
– reference: (pp. 66–83).
– volume: 38
  start-page: 58
  year: 1995
  end-page: 67
  ident: b1580
  article-title: Temporal difference learning and TD-Gammon
  publication-title: Communication of the ACM
– reference: MathWorks, Block diagram of reinforcement learning. (2023). Retrieved from https://www.mathworks.com/help/reinforcement-learning/ug/create-simulink-environments-for-reinforcement-learning.html. Accessed March 10, 2023.
– reference: Riedmiller, M. (2005). Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method. In
– volume: 16
  start-page: 221
  year: 1956
  end-page: 229
  ident: b0155
  article-title: A Problem in the sequential design of experiments
  publication-title: The Indian Journal of Statistics
– reference: (pp. 64–69).
– volume: 16
  start-page: 105
  year: 2002
  end-page: 133
  ident: b1450
  article-title: Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system
  publication-title: Journal of Artificial Intelligence Research
– volume: 529
  start-page: 484
  year: 2016
  end-page: 489
  ident: b1405
  article-title: Mastering the game of Go with deep neural networks and tree search
  publication-title: Nature
– volume: 15
  start-page: 210
  year: 1970
  end-page: 221
  ident: b0430
  article-title: Learning control systems—Review and outlook
  publication-title: IEEE Transactions on Automatic Control
– reference: 28(3) (pp. 924-932).
– start-page: 26
  year: 2017
  end-page: 38
  ident: b0070
  article-title: Deep reinforcement learning: A brief survey
  publication-title: IEEE Signal Processing Magazine
– reference: (pp. 6118-6128).
– volume: 538
  start-page: 142
  year: 2020
  end-page: 158
  ident: b1715
  article-title: Adaptive stock trading strategies with deep reinforcement learning methods
  publication-title: Information Sciences
– volume: 40
  start-page: 935
  year: 2022
  end-page: 946
  ident: b0880
  article-title: GNN-based hierarchical deep reinforcement learning for NFV-oriented online resource orchestration in elastic optical DCIs
  publication-title: Journal of Lightwave Technology
– volume: 11
  start-page: 1563
  year: 2010
  end-page: 1600
  ident: b0645
  article-title: Near-optimal regret bounds for reinforcement learning
  publication-title: Journal of Machine Learning Research
– reference: L., Barker,
– volume: 104
  start-page: 104630
  year: 2020
  ident: b0305
  article-title: Vision-based robust control framework based on deep reinforcement learning applied to autonomous ground vehicles
  publication-title: Control Engineering Practice
– start-page: 216
  year: 1990
  end-page: 224
  ident: b1525
  article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
  publication-title: Proceedings of the 7th International Conference Machine Learning Proceedings
– volume: 9
  start-page: 1735
  year: 1997
  end-page: 1780
  ident: b0590
  article-title: Long Short-Term Memory
  publication-title: Neural Computation
– volume: 8
  start-page: 208992
  year: 2020
  end-page: 209007
  ident: b0075
  article-title: Reinforcement learning techniques for optimal power control in grid-connected microgrids: A comprehensive review
  publication-title: IEEE Access
– reference: (pp. 2613–2621).
– volume: 34
  start-page: 286
  year: 1977
  end-page: 295
  ident: b1710
  article-title: An adaptive optimal controller for discrete-time markov environments
  publication-title: Information and Control
– reference: Schulman, J., Moritz, P., Levine, S., Jordan, M. I., & Abbeel, P. (2016). High-dimensional continuous control using generalized advantage estimation. In
– volume: 112
  start-page: 181
  year: 1999
  end-page: 211
  ident: b1555
  article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
  publication-title: Artificial Intelligence
– volume: 26
  start-page: 674
  year: 2021
  end-page: 691
  ident: b1825
  article-title: Deep reinforcement learning based mobile robot navigation: A review
  publication-title: Tsinghua Science and Technology
– volume: 6
  start-page: 355
  year: 2014
  end-page: 366
  ident: b0555
  article-title: A neuroevolution approach to general atari game playing
  publication-title: IEEE Transactions on Computational Intelligence and AI in Games
– volume: 88
  start-page: 135
  year: 1981
  end-page: 170
  ident: b1535
  article-title: Toward a modern theory of adaptive networks: Expectation and prediction
  publication-title: Psychological Review
– year: 1994
  ident: b1230
  article-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming
– volume: 81
  start-page: 15395
  year: 2022
  end-page: 15417
  ident: b0715
  article-title: Deep reinforcement learning approach for manuscripts image classification and retrieval
  publication-title: Multimedia Tools and Applications
– volume: 40
  start-page: 1721
  year: 2013
  end-page: 1736
  ident: b1015
  article-title: Neural network reinforcement learning for visual control of robot manipulator
  publication-title: Expert Systems With Applications
– volume: 1
  start-page: 228
  year: 1958
  end-page: 239
  ident: b0165
  article-title: Dynamic programming and stochastic control processes
  publication-title: Information and Control
– reference: (pp. 1580-1585).
– year: 2016
  ident: b0210
  publication-title: OpenAI Gym.
– reference: (pp. 501–510).
– volume: 18
  start-page: 2041
  year: 2022
  end-page: 2052
  ident: b1240
  article-title: Modeling, detecting, and mitigating threats against industrial healthcare systems: A combined software defined networking and reinforcement learning approach
  publication-title: IEEE Transactions on Industrial Informatics
– reference: Schmitt, S., Hessel, M., & Simonyan, K. (2019). Off-policy actor-critic with shared experience replay. arXiv preprint arXiv:1909.11583.
– volume: 106
  start-page: 104451
  year: 2021
  ident: b1750
  article-title: Quantum deep reinforcement learning for rotor side converter control of double-fed induction generator-based wind turbines
  publication-title: Engineering Applications of Artificial Intelligence
– reference: Schulman, J., Levine, S., Moritz, P., Jordan, M., & Abbeel, P. (2015). Trust region policy optimization. In
– reference: Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In
– volume: 4
  start-page: 132
  year: 2019
  end-page: 141
  ident: b1785
  article-title: Energy-efficient scheduling for real-time systems based on deep Q-learning model
  publication-title: IEEE Transactions on Sustainable Computing
– volume: 8
  start-page: 171058
  year: 2020
  end-page: 171077
  ident: b0035
  article-title: Reinforcement learning interpretation methods: A survey
  publication-title: IEEE Access
– reference: (pp. 317–328).
– volume: 23
  start-page: 4909
  year: 2022
  end-page: 4926
  ident: b0725
  article-title: Deep reinforcement learning for autonomous driving: A survey
  publication-title: IEEE Transactions On Intelligent Transportation Systems
– volume: 575
  start-page: 350
  year: 2019
  end-page: 354
  ident: b1625
  article-title: Grandmaster level in StarCraft II using multi-agent reinforcement learning
  publication-title: Nature
– volume: 50
  start-page: 1
  year: 2022
  end-page: 22
  ident: b0435
  article-title: Applications of reinforcement learning for building energy efficiency control: A review
  publication-title: Journal of Building Engineering
– reference: (pp. 353–360).
– reference: Luo, F., Xu, T., Lai, H., Chen, X., Zhang, W., & Yu, Y. (2022a). A survey on model-based reinforcement learning. arXiv preprint arXiv:2206.09328v1.
– reference: Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In
– reference: Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In
– reference: Fujimoto, S., Meger, D., & Precup, D. (2019). Off-policy deep reinforcement learning without exploration. In
– reference: (pp. 222–229).
– reference: Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. arXiv preprint arXiv arXiv:1505.04597.
– reference: Van Seijen, H., & Sutton, R. S. (2014). True online TD(λ). In
– volume: 261
  start-page: 1
  year: 2014
  end-page: 31
  ident: b1740
  article-title: Reinforcement learning algorithms with function approximation: Recent advances and applications
  publication-title: Information Sciences
– reference: Kulkarni, T. D., Narasimhan, K. R., Saeedi, A., & Tenenbaum, J. B. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In
– reference: Luo, J., Li, C., Fan, Q., & Liu, Y. (2022b). A graph convolutional encoder and multi-head attention decoder network for TSP via reinforcement learning.
– start-page: 177
  year: 2009
  end-page: 184
  ident: b1610
  article-title: A theoretical and empirical analysis of expected sarsa
  publication-title: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (IEEE)
– volume: 21
  start-page: 682
  year: 2008
  end-page: 697
  ident: b1205
  article-title: Reinforcement learning of motor skills with policy gradients
  publication-title: Neural Networks
– volume: 55
  start-page: 1
  year: 2022
  end-page: 38
  ident: b0020
  article-title: Reinforcement learning based recommender systems: A survey
  publication-title: ACM Computing Surveys
– reference: (pp. 2672–2680).
– reference: (pp. 664–671).
– reference: (pp. 1146-1155).
– volume: 596
  start-page: 583
  year: 2021
  end-page: 589
  ident: b0670
  article-title: Highly accurate protein structure prediction with AlphaFold
  publication-title: Nature
– reference: Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In
– reference: (pp. 10199–10210).
– reference: (pp. 401–408).
– volume: 54
  start-page: 3215
  year: 2020
  end-page: 3238
  ident: b0335
  article-title: A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications
  publication-title: Artificial Intelligence Review
– reference: Schaefer, A. M., Schneegass, D., Sterzing, V., & Udluft, S. (2007). A neural reinforcement learning approach to gas turbine control. In
– year: 1972
  ident: b0170
  article-title: Dynamic programming
– start-page: 1995
  year: 2016
  end-page: 2003
  ident: b1670
  article-title: Dueling network architectures for deep reinforcement learning
  publication-title: Proceedings of the 33rd International Conference on Machine Learning (ICML)
– volume: 18
  start-page: 2936
  year: 2006
  end-page: 2941
  ident: b1565
  article-title: Learning tetris using the noisy cross-entropy method
  publication-title: Neural Computation
– reference: Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. (2016). Benchmarking deep reinforcement learning for continuous control. In
– reference: (pp. 441–448).
– reference: (ICRA 2004) (pp. 2619–2624).
– reference: Foerster, J. N., Assael, Y. M., Freitas, N. de, & Whiteson, S. (2016b). Learning to communicate to solve riddles with deep distributed recurrent Q-networks. arXiv preprint arXiv:1602.02672.
– reference: (pp. 1–12).
– reference: Glanois, C., Weng, P., Zimmer, M., Li, D., Yang, T., Hao, J., & Liu, W. (2022). A survey on interpretable reinforcement learning. arXiv preprint arXiv: 2112.13112v2.
– reference: Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2023). Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104v1.
– reference: Wayne, G., Hung, C. C., Amos, D., Mirza, M., Ahuja, A., Barwinska, A. G., Rae, J., Mirowski, P., Leibo, J. Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hassabis, D., & Lillicrap, T. (2018). Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv: 1803.10760.
– reference: Feinberg, V., Wan, A., Stoica, I., Jordan, M. I., Gonzalez, J. E., & Levine, S. (2018). Model-based value expansion for efficient model-free reinforcement learning. arXiv preprint arXiv: 1803.00101v1.
– reference: Guss, W. H., Castro, M. Y., Devlin, S., Houghton, B., Kuno, N. S., Loomis, C., Milani, S., Mohanty, S., Nakata, K., Salakhutdinov, R., Schulman, J., Shiroshita, S., Topin, N., Ummadisingu, A., & Vinyals, O. (2021). NeurIPS 2020 Competition : The MineRL competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:2101.11071.
– reference: Kirsch, L., Steenkiste, S. Van, & Schmidhuber, J. (2020). Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098.
– reference: Marbach, P., Mihatsch, O., & Tsitsiklis, J. N. (1998). Call admission control and routing in integrated services networks using reinforcement learning. In
– reference: (pp. 1–13).
– volume: 64
  start-page: 81
  year: 2022
  end-page: 93
  ident: b0620
  article-title: Graph neural network and multi-agent reinforcement learning for machine-process-system integrated control to optimize production yield
  publication-title: Journal of Manufacturing Systems
– reference: Srinivas, A., Jabri, A., Abbeel, P., Levine, S., & Finn, C. (2018). Universal planning networks. arXiv preprint arXiv:1804.00645.
– reference: (pp. 1048–1056).
– reference: Wahlström, N., Schön, T. B., & Deisenroth, M. P. (2015). From pixels to torques: Policy learning with deep dynamical models. arXiv preprint arXiv: 1502.02251.
– volume: 40
  start-page: 75
  year: 2023
  end-page: 101
  ident: b0875
  article-title: Deep reinforcement learning in smart manufacturing: A review and prospects
  publication-title: CIRP Journal of Manufacturing Science and Technology
– reference: Abdoos, M., Mozayani, N., & Bazzan, A. L. C. (2011). Traffic light control in non-stationary environments based on multi agent Q-learning. In
– volume: 95
  start-page: 103869
  year: 2020
  ident: b0750
  article-title: Reinforcement learning for quadrupedal locomotion with design of continual–hierarchical curriculum
  publication-title: Engineering Applications of Artificial Intelligence
– volume: 8
  start-page: 293
  year: 1992
  end-page: 321
  ident: b0895
  article-title: Self-improving reactive agents based on reinforcement learning, planning and teaching
  publication-title: Machine Learning
– reference: (pp. 1–14).
– start-page: 11
  year: 1975
  end-page: 13
  ident: b0740
  article-title: A comparison of natural and artificial intelligence
  publication-title: ACM SIGART Bulletin
– reference: Elmo: Computer Shogi Association, Results of the 27th world computer shogi championship. (2023). Retrieved from http://www2.computer-shogi.org/wcsc27/index_e.html. Accessed March 10, 2023.
– volume: 521
  start-page: 436
  year: 2015
  end-page: 444
  ident: b0825
  article-title: Deep learning
  publication-title: Nature
– reference: Foerster, J. N., Assael, Y. M., De Freitas, N., & Whiteson, S. (2016a). Learning to communicate with deep multi-agent reinforcement learning. In
– reference: Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I, Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., & Legg, S. (2017). Noisy networks for exploration. arXiv preprint arXiv: 1706.10295v3.
– volume: 12
  start-page: 875
  year: 2001
  end-page: 889
  ident: b1050
  article-title: Learning to trade via direct reinforcement
  publication-title: IEEE Transactions on Neural Network
– volume: 191
  start-page: 116285
  year: 2022
  ident: b1780
  article-title: A distributed real-time pricing strategy based on reinforcement learning approach for smart grid
  publication-title: Expert Systems With Applications
– volume: 45
  start-page: 2673
  year: 1997
  end-page: 2681
  ident: b1375
  article-title: Bidirectional recurrent neural networks
  publication-title: IEEE Transactions on Signal Processing
– volume: 22
  start-page: 1
  year: 2022
  end-page: 29
  ident: b0375
  article-title: A novel reinforcement learning collision avoidance algorithm for usvs based on maneuvering characteristics and COLREGs
  publication-title: Sensors
– volume: 27
  start-page: 1011
  year: 2022
  end-page: 1022
  ident: b0235
  article-title: A learning-based vehicle trajectory-tracking approach for autonomous vehicles with lidar failure under various lighting conditions
  publication-title: IEEE/ASME Transactions on Mechatronics
– volume: 71
  start-page: 2511
  year: 2022
  end-page: 2526
  ident: b1065
  article-title: Bio-inspired collision avoidance in swarm systems via deep reinforcement learning
  publication-title: IEEE Transactions on Vehicular Technology
– reference: Matignon, L., Laurent, G. J., & Fort-piat, N. Le. (2007). Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In
– volume: 11
  start-page: 11
  year: 1997
  end-page: 73
  ident: b0080
  article-title: Locally Weighted Learning
  publication-title: Artificial Intelligence Review
– volume: 8
  start-page: 229
  year: 1992
  end-page: 256
  ident: b1705
  article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning
  publication-title: Machine Learning
– volume: 38
  start-page: 126
  year: 2019
  end-page: 145
  ident: b0775
  article-title: SWIRL : A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards
  publication-title: The International Journal of Robotics Research
– reference: (pp. 314–323).
– reference: (pp. 5900–5907).
– start-page: 650
  year: 2007
  end-page: 657
  ident: b0685
  article-title: Batch reinforcement learning in a complex domain
  publication-title: Proceedings of the 6th International Joint Conference On Autonomous Agents And Multiagent Systems
– reference: (pp. 3652-3661).
– reference: (pp. 1008–1014).
– volume: 9
  start-page: 3259
  year: 2018
  end-page: 3269
  ident: b0285
  article-title: Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control
  publication-title: IEEE Transactions on Smart Grid
– volume: 8
  start-page: 323
  year: 1999
  end-page: 338
  ident: b1275
  article-title: Concepts and facilities of a neural reinforcement learning control architecture for technical process control
  publication-title: Neural Computing and Applications
– reference: Agrawal, S. & Jia, R. (2017). Optimistic posterior sampling for reinforcement learning: worst-case regret bounds. In
– reference: (pp. 202–211).
– volume: 49
  start-page: 161
  year: 2002
  end-page: 178
  ident: b1145
  article-title: Kernel-based reinforcement learning
  publication-title: Machine Learning
– volume: 59
  start-page: 3166
  year: 2019
  end-page: 3176
  ident: b1475
  article-title: Deep reinforcement learning for multiparameter optimization in de novo drug design
  publication-title: Journal of Chemical Information and Modeling
– reference: Kalweit, G., & Boedecker, J. (2017). Uncertainty driven imagination for continuous deep reinforcement learning. In
– volume: 78
  start-page: 236
  year: 2019
  end-page: 247
  ident: b1165
  article-title: Reinforcement learning based compensation methods for robot manipulators
  publication-title: Engineering Applications of Artificial Intelligence
– reference: Kapturowski, S., Ostrovski, G., Quan, J., Munos, R., & Dabney, W. (2019). Recurrent experience replay in distributed reinforcement learning. In
– reference: Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In
– reference: (pp. 201–208).
– reference: Gao, Y., Xu, H., Lin, Ji., Yu, F., Levine, S., & Darrell, T. (2018). Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv: 1802.05313.
– reference: (pp. 4171–4186).
– reference: Hasselt, H. V. (2010). Double Q-learning. In
– volume: 33
  start-page: 2045
  year: 2022
  end-page: 2056
  ident: b0800
  article-title: Deep reinforcement learning with modulated Hebbian plus Q-network architecture
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
– reference: Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P. P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H, P. d. O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., & Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
– volume: 21
  start-page: 3133
  year: 2019
  end-page: 3174
  ident: b0950
  article-title: Applications of deep reinforcement learning in communications and networking: A survey
  publication-title: IEEE Communications Surveys and Tutorials
– volume: 8
  start-page: 176598
  year: 2020
  end-page: 176623
  ident: b0710
  article-title: A systematic review on reinforcement learning-based robotics within the last decade
  publication-title: IEEE Access
– reference: Salakhutdinov, R., & Hinton, G. (2009). Deep Boltzmann Machines. In
– volume: 4
  start-page: 217
  year: 1981
  end-page: 246
  ident: b1530
  article-title: An adaptive network that constructs and uses an internal model of its world
  publication-title: Cognition and Brain Theory
– reference: (pp. 1607-1612)
– volume: 8
  start-page: 8557
  year: 2021
  end-page: 8569
  ident: b0380
  article-title: Distributed deep reinforcement learning for renewable energy accommodation assessment with communication uncertainty in internet of energy
  publication-title: IEEE Internet Of Things Journal
– reference: Scholl, P., Dietrich, F., Otte, C., & Udluft, S. (2023). Safe policy improvement approaches and their limitations. In
– start-page: 1054
  year: 2016
  end-page: 1062
  ident: b1060
  article-title: Safe and efficient off-policy reinforcement learning
  publication-title: Proceedings of the 30th Conference on Neural Advances in Neural Information Processing Systems
– reference: Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. de Las, Budden, D., Abdolmaleki, A., Merel, J., Lefrancq, A., Lillicrap, T., & Riedmiller, M. (2018). DeepMind Control Suite. arXiv preprint arXiv:1801.00690.
– reference: (pp. 2085–2087).
– volume: 602
  start-page: 328
  year: 2022
  end-page: 350
  ident: b1465
  article-title: AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient
  publication-title: Information Sciences
– start-page: 2863
  year: 2015
  end-page: 2871
  ident: b1130
  article-title: Action-conditional video prediction using deep networks in Atari games
  publication-title: 28th International Conference on Neural Information Processing Systems
– volume: 378
  start-page: 1092
  year: 2022
  end-page: 1097
  ident: b0855
  article-title: Competition-level code generation with AlphaCode
  publication-title: Science
– start-page: 1
  year: 2022
  end-page: 14
  ident: b0325
  article-title: Target-value-competition-based multi-agent deep reinforcement learning algorithm for distributed nonconvex economic dispatch
  publication-title: IEEE Transactions on power systems
– volume: 518
  start-page: 529
  year: 2015
  end-page: 533
  ident: b1040
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
– reference: S., & McFall, J. (2013). Concurrent reinforcement learning from customer interactions. In
– reference: (pp. 1889–1897).
– volume: 65
  start-page: 87
  year: 2017
  end-page: 98
  ident: b0570
  article-title: Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies
  publication-title: Engineering Applications of Artificial Intelligence
– volume: 610
  start-page: 47
  year: 2022
  end-page: 53
  ident: b0390
  article-title: Discovering faster matrix multiplication algorithms with reinforcement learning
  publication-title: Nature
– reference: Nazari, M., Oroojlooy, A., Snyder, L. V., & Takáč, M. (2018). Reinforcement learning for solving the vehicle routing problem. arXiv preprint arXiv:1802.04240.
– reference: Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-critic algorithms. In
– start-page: 2681
  year: 2017
  end-page: 2690
  ident: b1140
  article-title: Deep decentralized multi-task multi-agent reinforcement learning under partial observability
  publication-title: Proceedings of the 34th International Conference on Machine Learning
– reference: . Lecture Notes in Computer Science, vol 7569. Springer, Berlin, Heidelberg.
– reference: Beattie, C., Leibo, J. Z., Teplyashin, D., Ward, T., Wainwright, M., Küttler, H., Lefrancq, A., Green, S., Valdés, V., Sadik, A., Schrittwieser, J., Anderson, K., York, S., Cant, M., Cain, A., Bolton, A., Gaffney, S., King, H., Hassabis, D., Legg, S., & Petersen, S. (2016). DeepMind Lab. arXiv preprint arXiv: 1612.03801.
– reference: Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., & Levine, S. (2018b). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
– volume: 134
  start-page: 1
  year: 2021
  end-page: 15
  ident: b0990
  article-title: Reinforcement learning for combinatorial optimization: A survey
  publication-title: Computers & Operations Research
– reference: Klopf, A. H. (1972). Brain function and adaptive systems: A heterostatic theory, Technical Report, Air Force Cambridge Research Labs Hanscom AFB MA.
– volume: 8
  start-page: 225945
  year: 2020
  end-page: 225956
  ident: b0795
  article-title: Coverage path planning for decomposition reconfigurable grid-maps using deep reinforcement learning based travelling salesman problem
  publication-title: IEEE Access
– volume: 619
  start-page: 930
  year: 2023
  end-page: 946
  ident: b1660
  article-title: Solving combinatorial optimization problems over graphs with BERT-Based deep reinforcement learning
  publication-title: Information Sciences
– reference: Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In
– reference: Fu, J., Kumar, A., Nachum, O., Tucker, G., & Levine, S. (2021). D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv: 2004.07219v4.
– volume: 55
  start-page: 589
  year: 2019
  end-page: 591
  ident: b0985
  article-title: Q-RTS : A real-time swarm intelligence based on multi-agent Q-learning
  publication-title: Electronics Letters
– reference: (pp. 2001–2014).
– reference: Lin, J., Chiu, H., & Gau, R. (2021). Decentralized planning-assisted deep reinforcement learning for collision and obstacle avoidance in UAV networks. In
– volume: 331
  start-page: 443
  year: 2019
  end-page: 457
  ident: b1810
  article-title: Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability
  publication-title: Neurocomputing
– volume: 8
  start-page: 208016
  year: 2020
  end-page: 208044
  ident: b1255
  article-title: Deep reinforcement learning for traffic signal control: A review
  publication-title: IEEE Access
– year: 1982
  ident: b0745
  article-title: The hedonistic neuron: A theory of memory, learning, and intelligence
– volume: 21
  start-page: 363
  year: 2006
  end-page: 372
  ident: b1100
  article-title: Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX
  publication-title: Springer Tracts in Advanced Robotics
– start-page: 3207
  year: 2018
  end-page: 3214
  ident: b0575
  article-title: Deep reinforcement learning that matters
  publication-title: Proceedings of the 32nd AAAI Conference on Artificial Intelligence
– volume: 84
  start-page: 109
  year: 2011
  end-page: 136
  ident: b1400
  article-title: Informing sequential clinical decision-making through reinforcement learning: An empirical study
  publication-title: Machine Learning
– volume: 22
  start-page: 123
  year: 1996
  end-page: 158
  ident: b1455
  article-title: Reinforcement learning with replacing eligibility traces
  publication-title: Machine Learning
– volume: 7
  start-page: 617
  year: 2020
  end-page: 626
  ident: b0925
  article-title: Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system
  publication-title: IEEE/CAA Journal of Automatica Sinica
– reference: Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G., Reichert, D., Rabinowitz, N., Barreto, A., & Degris, T. (2017b). The predictron: End-to-end learning and planning. In
– volume: 69
  start-page: 8554
  year: 2022
  end-page: 8565
  ident: b0935
  article-title: Deep reinforcement learning-based demand response for smart facilities energy management
  publication-title: IEEE Transactions on Industrial Electronics
– volume: 54
  start-page: 1
  year: 2021
  end-page: 35
  ident: b1175
  article-title: Hierarchical reinforcement learning: A comprehensive survey
  publication-title: ACM Computing Survey
– reference: Li, W., & Todorov, E. (2004). Iterative linear quadratic regulator design for nonlinear biological movement systems. In
– reference: (pp. 1352–1361).
– volume: 588
  start-page: 604
  year: 2020
  end-page: 609
  ident: b1350
  article-title: Mastering Atari, Go, chess and shogi by planning with a learned model
  publication-title: Nature
– volume: 57
  start-page: 469
  year: 2009
  end-page: 483
  ident: b0065
  article-title: A survey of robot learning from demonstration
  publication-title: Robotics and Autonomous Systems
– volume: 6
  start-page: 236
  year: 2019
  end-page: 246
  ident: b0255
  article-title: Parallel planning: A new motion planning framework for autonomous driving
  publication-title: IEEE/CAA Journal of Automatica Sinica
– year: 1996
  ident: b0185
  article-title: Neuro-dynamic programming
– volume: 50
  start-page: 119
  year: 2020
  end-page: 138
  ident: b0090
  article-title: From inverse optimal control to inverse reinforcement learning: A historical review
  publication-title: Annual Reviews in Control
– reference: Barreto, A., Dabney, W., Munos, R., Hunt, J. J., Schaul, T., Van Hasselt, H., & Silver, D. (2017). Successor features for transfer in reinforcement learning. In
– start-page: 1
  year: 2019
  end-page: 12
  ident: b1005
  article-title: Guided meta-policy search
  publication-title: Proceedings of the 33rd Conference on Neural Information Processing Systems 32
– start-page: 312
  year: 1996
  end-page: 317
  ident: b0530
  article-title: Adapting arbitrary normal mutation distributions in evolution strategies: The covariancematrix adaptation
  publication-title: Proceedings of the IEEE International Conference on Evolutionary Computation
– reference: Jiang, R., Zahavy, T., Xu, Z., White, A., Hessel, M., Blundell, C., & Hasselt, H. Van. (2021). Emphatic algorithms for deep reinforcement learning. In
– reference: Devlin, J., Chang, M., Kenton, L., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In
– reference: Laroche, R., Trichelair, P., & Combes, R. T. D. (2019). Safe policy improvement with baseline bootstrapping. In
– volume: 23
  start-page: 740
  year: 2022
  end-page: 759
  ident: b0060
  article-title: Survey of deep reinforcement learning for motion planning of autonomous vehicles
  publication-title: IEEE Transactions On Intelligent Transportation Systems
– reference: Nadjahi, K., Laroche, R., & Combes, R. T. (2019). Safe policy improvement with soft baseline bootstrapping. arXiv preprint arXiv: 1907.05079v1.
– reference: (pp. 7304-7312).
– reference: Azar, M. G., Osband, I., & Munos, R. (2017). Minimax regret bounds for reinforcement learning. In
– reference: Farahmand, A. M., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. (2008). Regularized policy iteration. In
– reference: (pp. 1204–1212).
– reference: .
– reference: Riedmiller, M., Hafner, R., Lampe, T., Neunert, M., Degrave, J., Van De Wiele, T., & Springenberg, T. (2018). Learning by playing - Solving sparse reward tasks from scratch. In
– reference: Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot. M., Sonnerat. N., Leibo. J. Z., Tuyls. K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning. In
– reference: Bakhtin, A., Wu, D. J., Lerer, A., Gray, J., Jacob, A. P., Farina, G., Miller, A. H., & Brown, N. (2022). Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning. arXiv preprint arXiv:2210.05492v1.
– reference: Chaffre, T., Moras, J., Chan-Hon-Tong, A., & Marzat, J. (2020). Sim-to-real transfer with incremental environment complexity for reinforcement learning of depth-based robot navigation. In
– start-page: 1
  year: 2010
  end-page: 7
  ident: b0600
  article-title: Multiobjective reinforcement learning for traffic signal control using vehicular ad hoc network
  publication-title: EURASIP Journal on Advances in Signal Processing
– start-page: 1
  year: 2015
  end-page: 11
  ident: b1090
  article-title: Language understanding for text-based games using deep reinforcement learning
  publication-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
– reference: Lee, A. X., Nagabandi, A., Abbeel, P., & Levine, S. (2020). Stochastic latent actor-critic : Deep reinforcement learning with a latent variable model. In
– reference: (pp. 3191-3199).
– reference: Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2020). Multi-agent actor-critic for mixed cooperative-competitive environments. In
– reference: (pp. 1184-1194).
– reference: (pp. 295-300).
– reference: Badia, A. P., Sprechmann, P., Vitvitskyi, A., Guo, D., Piot, B., Kapturowski, S., Tieleman, O., Arjovsky, M., Pritzel, A., Bolt, A., & Blundell, C. (2020b). Never give up : Learning directed exploration strategies. arXiv preprint arXiv:2002.06038.
– reference: Ha, D., & Eck, D. (2017). A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477v4.
– start-page: 5998
  year: 2017
  end-page: 6008
  ident: b1615
  article-title: Attention is all you need
  publication-title: Proceedings of the 31st International Conference on Neural Information Processing Systems
– start-page: 305
  year: 1989
  end-page: 313
  ident: b1215
  article-title: ALVINN: An autonomous land vehicle in a neural network
  publication-title: Proceedings of the 1st International Conference on Advances in Neural Information Processing Systems
– reference: 80 (pp. 1407–1416).
– volume: 6
  start-page: 5223
  year: 2021
  end-page: 5230
  ident: b1310
  article-title: Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning
  publication-title: IEEE Robotics and Automation Letters
– reference: (pp. 1859–1864).
– year: 1911
  ident: b1590
  article-title: Animal intelligence
– reference: Pong, V. H., Nair, A., Smith, L., Huang, C., & Levine, S. (2022). Offline meta-reinforcement learning with online self-supervision. arXiv preprint arXiv: 2107.03974v4.
– volume: 29
  start-page: 2063
  year: 2018
  end-page: 2079
  ident: b0965
  article-title: Applications of deep learning and reinforcement learning to biological data
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
– reference: (pp. 1691-1696).
– reference: Rashid, T., Farquhar, G., Peng, B., & Whiteson, S. (2020). Weighted QMIX : Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In
– start-page: 3986
  year: 2018
  end-page: 3995
  ident: b1160
  article-title: Reinforcement learning with function-valued action spaces for partial differential equation control
  publication-title: Proceedings of the 35th International Conference on Machine Learning
– volume: 50
  start-page: 3826
  year: 2020
  end-page: 3839
  ident: b1115
  article-title: Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications
  publication-title: IEEE Transactions On Cybernetics
– volume: 213
  start-page: 1
  year: 2023
  end-page: 13
  ident: b0905
  article-title: REDRL: A review-enhanced deep reinforcement learning model for interactive recommendation
  publication-title: Expert Systems With Applications
– volume: 159
  start-page: 96
  year: 2019
  end-page: 109
  ident: b0225
  article-title: Adversarial environment reinforcement learning algorithm for intrusion detection
  publication-title: Computer Networks
– reference: Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. In
– volume: 243
  start-page: 1
  year: 2022
  end-page: 10
  ident: b1735
  article-title: FusionSum: Abstractive summarization with sentence fusion and cooperative reinforcement learning
  publication-title: Knowledge-Based Systems
– reference: Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., & Mordatch, I. (2019). Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv: 1909.07528.
– start-page: 1
  year: 2022
  end-page: 21
  ident: b0465
  article-title: RLAS-BIABC: A reinforcement learning-based answer selection using the bert model boosted by an improved ABC algorithm
  publication-title: Computational Intelligence and Neuroscience
– reference: Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., Hasselt, H. V., Silver, D., Lillicrap, T., Calderone, K., Keet, P., Brunasso, A., Lawrence, D., Ekermo, A., Repp, J., & Tsing, R. (2017). StarCraft II: A new challenge for reinforcement learning. arXiv preprint arXiv: 1708.04782.
– reference: Peters, J., & Schaal, S. (2007). Applying the episodic natural actor-critic architecture to motor primitive learning. In
– reference: (pp. 2961–2970).
– volume: 27
  start-page: 846
  year: 2022
  end-page: 857
  ident: b1820
  article-title: Rule-based reinforcement learning for efficient robot navigation with space reduction
  publication-title: IEEE/ASME Transactions on Mechatronics
– volume: 199
  start-page: 1
  year: 2022
  end-page: 32
  ident: b1125
  article-title: Reinforcement learning in urban network traffic signal control: A systematic literature review
  publication-title: Expert Systems With Applications
– reference: (pp. 1787–1798).
– volume: 388
  start-page: 12
  year: 2020
  end-page: 23
  ident: b1725
  article-title: Integration of an actor-critic model and generative adversarial networks for a Chinese calligraphy robot
  publication-title: Neurocomputing
– reference: Engel, Y., Mannor, S., & Ron, M. (2005). Reinforcement learning with Gaussian processes. In
– volume: 15
  start-page: 319
  year: 2001
  end-page: 350
  ident: b0130
  article-title: Infinite-horizon policy-gradient estimation
  publication-title: Journal of Artificial Intelligence Research
– reference: (pp. 443–451).
– volume: 10
  start-page: 390
  year: 1965
  end-page: 398
  ident: b1645
  article-title: A heuristic approach to reinforcement learning control systems
  publication-title: IEEE Transactions on Automatic Control
– volume: 1977
  start-page: 25
  year: 1977
  end-page: 38
  ident: b1700
  article-title: Advanced forecasting methods for global crisis warning and models of intelligence
  publication-title: General Systems, XXI I
– reference: Liu, F., & Qian, C. (2021). Prediction guided meta-learning for multi-objective reinforcement learning. In
– start-page: 1
  year: 2018
  end-page: 14
  ident: b1220
  article-title: Temporal difference models: Model-free deep RL for model-based control
  publication-title: Proceedings of the 6th International Conference on Learning Representations (ICLR)
– volume: 62
  start-page: 104
  year: 2016
  end-page: 115
  ident: b0345
  article-title: Neural networks based reinforcement learning for mobile robots obstacle avoidance
  publication-title: Expert Systems With Applications
– reference: (pp. 19–26).
– volume: 8
  start-page: 3075
  year: 2021
  end-page: 3087
  ident: b1790
  article-title: CDDPG: A deep-reinforcement-learning-based approach for electric vehicle charging control
  publication-title: IEEE Internet of Things Journal
– reference: ,
– volume: 468
  year: 2022
  ident: b1775
  article-title: Deep neural networks based temporal-difference methods for high-dimensional parabolic partial differential equations
  publication-title: Journal of Computational Physics
– reference: (pp. 22–31).
– reference: Bloembergen, D., Kaisers, M., & Tuyls, K. (2010). Lenient frequency adjusted Q-learning. In
– reference: (pp. 1–21).
– volume: 49
  start-page: 8
  year: 1961
  end-page: 30
  ident: b1025
  article-title: Steps Toward Artificial Intelligence
  publication-title: Proceedings of the IRE
– volume: 6
  start-page: 503
  year: 2005
  end-page: 556
  ident: b0360
  article-title: Tree-based batch mode reinforcement learning
  publication-title: Journal of Machine Learning Research
– volume: 5
  start-page: 297
  year: 1966
  end-page: 303
  ident: b1000
  article-title: A survey of learning control systems
  publication-title: ISA Transactions
– reference: Mnih, V., Badia, A. P., Mirza, L., Graves, A., Harley, T., Lillicrap, T. P., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In
– volume: 12
  start-page: 1057
  year: 2000
  end-page: 1063
  ident: b1550
  article-title: Policy gradient methods for reinforcement learning with function approximation
  publication-title: Advances in Neural Information Processing Systems
– reference: (pp. 2587–2601).
– volume: 61
  start-page: 1848
  year: 2013
  end-page: 1862
  ident: b0695
  article-title: QD-Learning : A collaborative distributed strategy for multi-agent reinforcement learning through
  publication-title: IEEE Transactions on Signal Process
– reference: Crites, R. H., & Barto, A. G. (1994). An actor / critic algorithm that equivalent to Q-learning. In
– year: 2019
  ident: b1235
  article-title: Language models are unsupervised multitask learners
– volume: 114
  start-page: 1
  year: 2022
  end-page: 18
  ident: b0015
  article-title: Cyber-security and reinforcement learning — A brief survey
  publication-title: Engineering Applications of Artificial Intelligence
– volume: 13
  start-page: 103
  year: 1993
  end-page: 130
  ident: b1055
  article-title: Prioritized sweeping: Reinforcement learning with less data and less time
  publication-title: Machine Learning
– volume: 7
  start-page: 6638
  year: 2022
  end-page: 6645
  ident: b0195
  article-title: VesNet-RL: Simulation-based reinforcement learning for real-world US probe navigation
  publication-title: IEEE Robotics and Automation Letters
– reference: (pp. 661-670).
– reference: (pp. 449-458).
– start-page: 1433
  year: 2008
  end-page: 1438
  ident: b1585
  article-title: Maximum entropy inverse reinforcement learning brian
  publication-title: Proceedings of the 23rd AAAI Conference on Artificial Intelligence
– reference: Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient multi-agent deep reinforcement learning. In
– reference: (pp. 1–20).
– reference: Maei, H. R., Szepesvari, C., Bhatnagar, S., Precup, D., Silver, D., & Sutton, R. S. (2009). Convergent temporal-difference learning with arbitrary smooth function approximation. In
– reference: Rummery, G. A., & Niranjan, M. (1994). On-Line Q-Learning using connectionist systems. In
– volume: 31
  start-page: 1573
  year: 2022
  end-page: 1586
  ident: b0915
  article-title: Video summarization through reinforcement with a 3D spatio-temporal U-net
  publication-title: IEEE Transactions on Image Processing
– volume: 13
  start-page: 227
  year: 2000
  end-page: 303
  ident: b0320
  article-title: Hierarchical reinforcement learning with the MAXQ value function decomposition
  publication-title: Journal of Artificial Intelligence Research
– reference: Watkins, C. J. C. H. (1989). Learning from delayed rewards, King’s College Cambridge, Ph.D. thesis.
– reference: Anderson, R. N., Boulanger, A., Powell, W. B., & Scott, W. (2011). Adaptive stochastic control for the smart grid. In
– reference: (pp. 465-472).
– volume: 46
  start-page: 8
  year: 2018
  end-page: 28
  ident: b0220
  article-title: Reinforcement learning for control: Performance, stability, and deep approximators
  publication-title: Annual Reviews in Control
– volume: 13
  start-page: 3041
  year: 2012
  end-page: 3074
  ident: b0815
  article-title: Finite-sample analysis of least-squares policy iteration
  publication-title: Journal of Machine Learning Research
– volume: 127
  start-page: 282
  year: 2019
  end-page: 294
  ident: b1395
  article-title: Reinforcement learning –Overview of recent progress and implications for process control
  publication-title: Computers and Chemical Engineering
– reference: (pp. 387-395).
– start-page: 4246
  year: 2016
  end-page: 4247
  ident: b0665
  article-title: The malmo platform for artificial intelligence experimentation
  publication-title: Proceedings of the 25th International Joint Conference on Artificial Intelligence
– reference: D., Weller,
– start-page: 1725
  year: 2014
  end-page: 1732
  ident: b0700
  article-title: Large-scale video classification with convolutional neural networks
  publication-title: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
– volume: 9
  start-page: 1
  year: 2019
  end-page: 19
  ident: b1665
  article-title: A text abstraction summary model based on BERT word embedding and reinforcement learning
  publication-title: Applied Sciences
– reference: Mingshuo, N., Dongming, C., & Dongqi, W. (2022). Reinforcement learning on graph: A survey. arXiv preprint arXiv:2204.06127v3.
– reference: (pp. 563-568).
– reference: (pp. 2252–2260).
– volume: 555
  start-page: 604
  year: 2018
  end-page: 610
  ident: b1380
  article-title: Planning chemical syntheses with deep neural networks and symbolic AI
  publication-title: Nature
– volume: 32
  start-page: 1238
  year: 2013
  end-page: 1274
  ident: b0755
  article-title: Reinforcement learning in robotics: A survey
  publication-title: International Journal of Robotics Research
– reference: (pp. 3682–3690).
– reference: Rashid, T., Samvelyan, M., Witt, C. S. de, Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In
– year: 1960
  ident: b0605
  article-title: Dynamic Programming and Markov Processes
– reference: Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv: 1912.01603v3.
– start-page: 663
  year: 2000
  end-page: 670
  ident: b1105
  article-title: Algorithms for inverse reinforcement learning
  publication-title: Proceedings of the 17th International Conference on Machine Learning
– reference: Turing, A. (1948). Intelligent machinery: Report for National physical laboratory universal turing machine.
– volume: 71
  start-page: 1180
  year: 2008
  end-page: 1190
  ident: b1200
  article-title: Natural actor-critic
  publication-title: Neurocomputing
– volume: 49
  start-page: 337
  year: 2019
  end-page: 349
  ident: b0865
  article-title: Human-centered reinforcement learning: A survey
  publication-title: IEEE Transactions on Human-Machine Systems
– reference: Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. arXiv preprint arXiv: 1708.02596v2.
– reference: Stockfish: Strong open source chess engine. (2022). Retrieved from https://stockfishchess.org/. Accessed March 10, 2023.
– volume: 6
  start-page: S191
  year: 2020
  ident: b0045
  article-title: Introduction to deep learning
  publication-title: MIT Course Number
– reference: Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., & Mordatch, I. (2021). Decision transformer : Reinforcement learning via sequence modeling. arXiv preprint arXiv: 2106.01345.
– volume: 45
  start-page: 2471
  year: 2009
  end-page: 2482
  ident: b0190
  article-title: Natural actor-critic algorithms
  publication-title: Automatica
– volume: 4
  start-page: 1107
  year: 2003
  end-page: 1149
  ident: b0805
  article-title: Least-squares policy iteration
  publication-title: Journal of Machine Learning Research
– reference: Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., & Botvinick, M. (2016a). Learning to reinforcement learn. arXiv preprint arXiv: 1611.05763v3.
– reference: Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Hasselt, H., & Silver, D. (2018). Distributed prioritized experience replay. In
– start-page: 448
  year: 2015
  end-page: 456
  ident: b0630
  article-title: Batch normalization: Accelerating deep network training by reducing internal covariate shift
  publication-title: 32nd International Conference on Machine Learning (ICML)
– reference: Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In
– volume: 22
  start-page: 1
  year: 2020
  end-page: 13
  ident: b0535
  article-title: Entanglement classification via neural network quantum states
  publication-title: New Journal of Physics
– reference: Hasselt, H. Van, Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-Learning. In
– volume: 602
  start-page: 298
  year: 2022
  end-page: 312
  ident: b1675
  article-title: A reinforcement learning level-based particle swarm optimization algorithm for large-scale optimization
  publication-title: Information Sciences
– reference: Moerland, T. M., Broekens, J., Plaat, A., & Jonker., C. M. (2022). Model-based reinforcement learning: A Survey. arXiv preprint arXiv: 2006.16712v4.
– reference: Peters, J., Mulling, K., & Altun, Y. (2010). Relative entropy policy search. I
– reference: Paine, T. L., Paduraru, C., Michi, A., Gulcehre, C., Żołna, K., Novikov, A., Wang. Z., & Freitas, N. de. (2020). Hyperparameter selection for offline reinforcement learning. arXiv preprint arXiv:2007.09055.
– volume: 70
  start-page: 377
  year: 2021
  end-page: 380
  ident: b0625
  article-title: Integrated process-system modelling and control through graph neural network and reinforcement learning
  publication-title: CIRP Annals
– reference: Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, D., & Blundell, C. (2020a). Agent57: Outperforming the atari human benchmark. arXiv preprint arXiv: 2003.13350v1.
– reference: Swazinna, P., Udluft, S., & Runkler, T. (2021). Overcoming model bias for robust offline deep reinforcement learning. arXiv preprint arXiv:2008.05533v4.
– volume: 10
  start-page: 2133
  year: 2009
  end-page: 2136
  ident: b1570
  article-title: RL-Glue: Language-independent software for reinforcement-learning experiments
  publication-title: Journal of Machine Learning Research
– volume: 67
  start-page: 1
  year: 2021
  end-page: 9
  ident: b0115
  article-title: Deep neural network based missing data prediction of electrocardiogram signal using multiagent reinforcement learning
  publication-title: Biomedical Signal Processing and Control
– reference: (pp. 195-206).
– reference: Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In
– reference: (
– reference: (pp. 448–455).
– year: 2018
  ident: b1620
  article-title: Programmatically interpretable reinforcement learning
  publication-title: Proceedings of the 35th International Conference on Machine Learning (PMLR)
– volume: 7
  year: 2021
  ident: b1245
  article-title: Autonomous reinforcement learning agent for chemical vapor deposition synthesis of quantum materials
  publication-title: npj Computational Materials
– volume: 73
  start-page: 1
  year: 2021
  end-page: 20
  ident: b1815
  article-title: Deep reinforcement learning in medical imaging: A literature review
  publication-title: Medical Image Analysis
– reference: (pp. 864–871).
– volume: 9
  start-page: 72661
  year: 2021
  end-page: 72669
  ident: b0705
  article-title: Reinforcing synthetic data for meticulous survival prediction of patients suffering from left ventricular systolic dysfunction
  publication-title: IEEE Access
– volume: 55
  start-page: 2733
  year: 2022
  end-page: 2819
  ident: b0820
  article-title: Deep reinforcement learning in computer vision: A comprehensive survey
  publication-title: Artificial Intelligence Review
– reference: Chua, K., Calandra, R., McAllister, R., & Levine, S. (2018). Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In
– reference: (pp. 263-272).
– reference: Maes, F., Fonteneau, R., Wehenkel, L., & Ernst, D. (2012). Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, JG., Lenca, P., Petit, JM. (eds)
– start-page: 5739
  year: 2018
  end-page: 5743
  ident: b1755
  article-title: Towards sample efficient reinforcement learning
  publication-title: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18)
– reference: (pp. 1928–1937).
– reference: Kidambi, R., Rajeswaran, A., Netrapalli, P., & Joachims, T. (2020). MOReL: Model-based offline reinforcement learning. In
– reference: Fox, R., Pakman, A., & Tishby, N. (2016). Taming the noise in reinforcement learning via soft updates. In
– reference: Scheikl, P. M., Gyenes, B., Davitashvili, T., Younis, R., Schulze, A., Muller-Stich, B. P., Neumann. G., & Mathis-Ullrich, F. (2021). Cooperative assistance in robotic surgery through multi-agent reinforcement learning. In
– start-page: 10674
  year: 2021
  end-page: 10681
  ident: b1745
  article-title: Improving sample efficiency in model-free reinforcement learning from images
  publication-title: Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21)
– volume: 26
  start-page: 3757
  year: 2022
  end-page: 3775
  ident: b1635
  article-title: An integrated network embedding with reinforcement learning for explainable recommendation
  publication-title: Soft Computing - A Fusion of Foundations, Methodologies and Applications
– reference: Goodfellow, I. J., Pouget-abadie, J., Mirza, M., Xu, B., Warde-farley, D., Ozair, S., Courville. A., & Bengio, Y. (2014). Generative Adversarial Nets. In
– start-page: 1861
  year: 2018
  end-page: 1870
  ident: b0505
  article-title: Soft actor-critic: Off-policy Maximum entropy deep reinforcement learning with a stochastic actor
  publication-title: Proceedings of the 35th International Conference on Machine Learning 5
– volume: 25
  start-page: 176
  year: 2021
  end-page: 180
  ident: b1500
  article-title: Combining deep reinforcement learning with graph neural networks for optimal VNF placement
  publication-title: IEEE Communications Letters
– volume: 49
  start-page: 8
  issue: 1
  year: 1961
  ident: 10.1016/j.eswa.2023.120495_b1025
  article-title: Steps Toward Artificial Intelligence
  publication-title: Proceedings of the IRE
  doi: 10.1109/JRPROC.1961.287775
– start-page: 1433
  year: 2008
  ident: 10.1016/j.eswa.2023.120495_b1585
  article-title: Maximum entropy inverse reinforcement learning brian
– volume: 261
  start-page: 1
  year: 2014
  ident: 10.1016/j.eswa.2023.120495_b1740
  article-title: Reinforcement learning algorithms with function approximation: Recent advances and applications
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2013.08.037
– ident: 10.1016/j.eswa.2023.120495_b1390
– volume: 12
  start-page: 875
  issue: 4
  year: 2001
  ident: 10.1016/j.eswa.2023.120495_b1050
  article-title: Learning to trade via direct reinforcement
  publication-title: IEEE Transactions on Neural Network
  doi: 10.1109/72.935097
– volume: 8
  start-page: 176598
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0710
  article-title: A systematic review on reinforcement learning-based robotics within the last decade
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.3027152
– volume: 9
  start-page: 1
  issue: 21
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1665
  article-title: A text abstraction summary model based on BERT word embedding and reinforcement learning
  publication-title: Applied Sciences
  doi: 10.3390/app9214701
– start-page: 7667
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0540
  article-title: Learning with safety constraints: Sample complexity of reinforcement learning for constrained MDPs
– volume: 191
  start-page: 116285
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1780
  article-title: A distributed real-time pricing strategy based on reinforcement learning approach for smart grid
  publication-title: Expert Systems With Applications
  doi: 10.1016/j.eswa.2021.116285
– volume: 13
  start-page: 41
  year: 2003
  ident: 10.1016/j.eswa.2023.120495_b0125
  article-title: Recent advances in hierarchical reinforcement learning
  publication-title: Discrete Event Dynamic Systems: Theory and Applications
  doi: 10.1023/A:1022140919877
– ident: 10.1016/j.eswa.2023.120495_b0385
– ident: 10.1016/j.eswa.2023.120495_b0660
– ident: 10.1016/j.eswa.2023.120495_b1325
  doi: 10.1109/IJCNN.2007.4371212
– volume: 610
  start-page: 47
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0390
  article-title: Discovering faster matrix multiplication algorithms with reinforcement learning
  publication-title: Nature
  doi: 10.1038/s41586-022-05172-4
– ident: 10.1016/j.eswa.2023.120495_b1335
  doi: 10.1109/IROS51168.2021.9636193
– ident: 10.1016/j.eswa.2023.120495_b1195
– ident: 10.1016/j.eswa.2023.120495_b1470
– volume: 134
  start-page: 57
  year: 2002
  ident: 10.1016/j.eswa.2023.120495_b0230
  article-title: Deep blue
  publication-title: Artificial Intelligence
  doi: 10.1016/S0004-3702(01)00129-1
– volume: 65
  start-page: 87
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b0570
  article-title: Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2017.07.005
– start-page: 3215
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b0580
  article-title: Rainbow: Combining improvements in deep reinforcement learning
– volume: 1977
  start-page: 25
  year: 1977
  ident: 10.1016/j.eswa.2023.120495_b1700
  article-title: Advanced forecasting methods for global crisis warning and models of intelligence
  publication-title: General Systems, XXI I
– volume: 61
  start-page: 1848
  issue: 7
  year: 2013
  ident: 10.1016/j.eswa.2023.120495_b0695
  article-title: QD-Learning : A collaborative distributed strategy for multi-agent reinforcement learning through
  publication-title: IEEE Transactions on Signal Process
  doi: 10.1109/TSP.2013.2241057
– volume: 159
  start-page: 96
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0225
  article-title: Adversarial environment reinforcement learning algorithm for intrusion detection
  publication-title: Computer Networks
  doi: 10.1016/j.comnet.2019.05.013
– ident: 10.1016/j.eswa.2023.120495_b0350
– year: 2016
  ident: 10.1016/j.eswa.2023.120495_b0210
  publication-title: OpenAI Gym.
– volume: 32
  start-page: 1238
  issue: 11
  year: 2013
  ident: 10.1016/j.eswa.2023.120495_b0755
  article-title: Reinforcement learning in robotics: A survey
  publication-title: International Journal of Robotics Research
  doi: 10.1177/0278364913495721
– ident: 10.1016/j.eswa.2023.120495_b0780
– ident: 10.1016/j.eswa.2023.120495_b1345
  doi: 10.1007/978-3-031-22953-4_4
– ident: 10.1016/j.eswa.2023.120495_b0910
  doi: 10.1155/2021/5300189
– volume: 42
  start-page: 674
  year: 1997
  ident: 10.1016/j.eswa.2023.120495_b1595
  article-title: An analysis of temporal-difference learning with function approximation
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/9.580874
– start-page: 1995
  year: 2016
  ident: 10.1016/j.eswa.2023.120495_b1670
  article-title: Dueling network architectures for deep reinforcement learning
– volume: 331
  start-page: 443
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1810
  article-title: Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2018.11.072
– ident: 10.1016/j.eswa.2023.120495_b0510
– volume: 8
  start-page: 229
  issue: 3
  year: 1992
  ident: 10.1016/j.eswa.2023.120495_b1705
  article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning
  publication-title: Machine Learning
  doi: 10.1023/A:1022672621406
– volume: 114
  start-page: 1
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0015
  article-title: Cyber-security and reinforcement learning — A brief survey
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2022.105116
– ident: 10.1016/j.eswa.2023.120495_b1630
– start-page: 5285
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b1720
  article-title: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation
– ident: 10.1016/j.eswa.2023.120495_b0975
– ident: 10.1016/j.eswa.2023.120495_b1515
– volume: 64
  start-page: 81
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0620
  article-title: Graph neural network and multi-agent reinforcement learning for machine-process-system integrated control to optimize production yield
  publication-title: Journal of Manufacturing Systems
  doi: 10.1016/j.jmsy.2022.05.018
– ident: 10.1016/j.eswa.2023.120495_b0545
– ident: 10.1016/j.eswa.2023.120495_b0995
  doi: 10.1145/1390156.1390240
– volume: 69
  start-page: 8554
  issue: 8
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0935
  article-title: Deep reinforcement learning-based demand response for smart facilities energy management
  publication-title: IEEE Transactions on Industrial Electronics
  doi: 10.1109/TIE.2021.3104596
– volume: 602
  start-page: 298
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1675
  article-title: A reinforcement learning level-based particle swarm optimization algorithm for large-scale optimization
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2022.04.053
– volume: 8
  start-page: 225945
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0795
  article-title: Coverage path planning for decomposition reconfigurable grid-maps using deep reinforcement learning based travelling salesman problem
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.3045027
– ident: 10.1016/j.eswa.2023.120495_b0030
– ident: 10.1016/j.eswa.2023.120495_b0485
  doi: 10.1007/978-3-319-71682-4_5
– ident: 10.1016/j.eswa.2023.120495_b0425
– volume: 22
  start-page: 33
  year: 1996
  ident: 10.1016/j.eswa.2023.120495_b0205
  article-title: Linear least-squares algorithms for temporal difference learning
  publication-title: Machine Learning
  doi: 10.1023/A:1018056104778
– volume: 9
  start-page: 67259
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0610
  article-title: Reward shaping based federated reinforcement learning
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2021.3074221
– ident: 10.1016/j.eswa.2023.120495_b0275
– start-page: 1
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1805
  article-title: State representation learning for effective deep reinforcement learning
– ident: 10.1016/j.eswa.2023.120495_b0200
– start-page: 2681
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b1140
  article-title: Deep decentralized multi-task multi-agent reinforcement learning under partial observability
– start-page: 2746
  year: 2015
  ident: 10.1016/j.eswa.2023.120495_b1685
  article-title: Embed to control: A locally linear latent dynamics model for control from raw images
– volume: 550
  start-page: 354
  issue: 7676
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b1425
  article-title: Mastering the game of Go without human knowledge
  publication-title: Nature
  doi: 10.1038/nature24270
– ident: 10.1016/j.eswa.2023.120495_b0785
– volume: 67
  start-page: 1
  issue: 102508
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0115
  article-title: Deep neural network based missing data prediction of electrocardiogram signal using multiagent reinforcement learning
  publication-title: Biomedical Signal Processing and Control
– volume: 6
  start-page: 679
  issue: 5
  year: 1957
  ident: 10.1016/j.eswa.2023.120495_b0160
  article-title: A Markovian decision process
  publication-title: Journal of Mathematics and Mechanics
– start-page: 177
  year: 2009
  ident: 10.1016/j.eswa.2023.120495_b1610
  article-title: A theoretical and empirical analysis of expected sarsa
– volume: 22
  start-page: 4550
  issue: 7
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1075
  article-title: A generative adversarial network enabled deep distributional reinforcement learning for transmission scheduling in internet of vehicles
  publication-title: IEEE Transactions on Intelligent Transportation Systems
  doi: 10.1109/TITS.2020.3033577
– ident: 10.1016/j.eswa.2023.120495_b0890
  doi: 10.1109/VTC2021-Spring51267.2021.9448710
– volume: 182
  start-page: 115127
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1460
  article-title: Deep graph convolutional reinforcement learning for financial portfolio management – DeepPocket
  publication-title: Expert Systems With Applications
  doi: 10.1016/j.eswa.2021.115127
– ident: 10.1016/j.eswa.2023.120495_b0315
– volume: 88
  start-page: 135
  issue: 2
  year: 1981
  ident: 10.1016/j.eswa.2023.120495_b1535
  article-title: Toward a modern theory of adaptive networks: Expectation and prediction
  publication-title: Psychological Review
  doi: 10.1037/0033-295X.88.2.135
– ident: 10.1016/j.eswa.2023.120495_b0245
  doi: 10.5220/0009821603140323
– ident: 10.1016/j.eswa.2023.120495_b0420
– ident: 10.1016/j.eswa.2023.120495_b0310
– ident: 10.1016/j.eswa.2023.120495_b0120
– ident: 10.1016/j.eswa.2023.120495_b0635
– volume: 173
  start-page: 114663
  issue: 2
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1765
  article-title: Reinforcement learning approach for resource allocation in humanitarian logistics
  publication-title: Expert Systems With Applications
  doi: 10.1016/j.eswa.2021.114663
– year: 2008
  ident: 10.1016/j.eswa.2023.120495_b0560
– ident: 10.1016/j.eswa.2023.120495_b1510
  doi: 10.1609/aaai.v34i04.6049
– volume: 13
  start-page: 227
  year: 2000
  ident: 10.1016/j.eswa.2023.120495_b0320
  article-title: Hierarchical reinforcement learning with the MAXQ value function decomposition
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.639
– volume: 50
  start-page: 1
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0435
  article-title: Applications of reinforcement learning for building energy efficiency control: A review
  publication-title: Journal of Building Engineering
  doi: 10.1016/j.jobe.2022.104165
– volume: 26
  start-page: 674
  issue: 5
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1825
  article-title: Deep reinforcement learning based mobile robot navigation: A review
  publication-title: Tsinghua Science and Technology
  doi: 10.26599/TST.2021.9010012
– volume: 9
  start-page: 5785
  issue: 8
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0330
  article-title: Trajectory design and access control for air – Ground coordinated communications system with multiagent deep reinforcement learning
  publication-title: IEEE Internet of Things Journal
  doi: 10.1109/JIOT.2021.3062091
– ident: 10.1016/j.eswa.2023.120495_b1085
– ident: 10.1016/j.eswa.2023.120495_b1360
– volume: 8
  start-page: 3075
  issue: 5
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1790
  article-title: CDDPG: A deep-reinforcement-learning-based approach for electric vehicle charging control
  publication-title: IEEE Internet of Things Journal
  doi: 10.1109/JIOT.2020.3015204
– ident: 10.1016/j.eswa.2023.120495_b0395
– ident: 10.1016/j.eswa.2023.120495_b0240
– ident: 10.1016/j.eswa.2023.120495_b0410
  doi: 10.1609/aaai.v32i1.11794
– volume: 38
  start-page: 126
  issue: 2–3
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0775
  article-title: SWIRL : A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards
  publication-title: The International Journal of Robotics Research
  doi: 10.1177/0278364918784350
– start-page: 305
  year: 1989
  ident: 10.1016/j.eswa.2023.120495_b1215
  article-title: ALVINN: An autonomous land vehicle in a neural network
– ident: 10.1016/j.eswa.2023.120495_b1285
– volume: 619
  start-page: 930
  year: 2023
  ident: 10.1016/j.eswa.2023.120495_b1660
  article-title: Solving combinatorial optimization problems over graphs with BERT-Based deep reinforcement learning
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2022.11.073
– volume: 62
  start-page: 104
  year: 2016
  ident: 10.1016/j.eswa.2023.120495_b0345
  article-title: Neural networks based reinforcement learning for mobile robots obstacle avoidance
  publication-title: Expert Systems With Applications
  doi: 10.1016/j.eswa.2016.06.021
– ident: 10.1016/j.eswa.2023.120495_b0400
– ident: 10.1016/j.eswa.2023.120495_b0735
– ident: 10.1016/j.eswa.2023.120495_b0830
– year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1235
– volume: 6
  start-page: 503
  year: 2005
  ident: 10.1016/j.eswa.2023.120495_b0360
  article-title: Tree-based batch mode reinforcement learning
  publication-title: Journal of Machine Learning Research
– volume: 88
  start-page: 103360
  issue: 1
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0460
  article-title: Teaching a humanoid robot to walk faster through safe reinforcement learning
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2019.103360
– volume: 27
  start-page: 846
  issue: 2
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1820
  article-title: Rule-based reinforcement learning for efficient robot navigation with space reduction
  publication-title: IEEE/ASME Transactions on Mechatronics
  doi: 10.1109/TMECH.2021.3072675
– start-page: 10674
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1745
  article-title: Improving sample efficiency in model-free reinforcement learning from images
– volume: 12
  start-page: 1057
  year: 2000
  ident: 10.1016/j.eswa.2023.120495_b1550
  article-title: Policy gradient methods for reinforcement learning with function approximation
  publication-title: Advances in Neural Information Processing Systems
– volume: 16
  start-page: 105
  year: 2002
  ident: 10.1016/j.eswa.2023.120495_b1450
  article-title: Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.859
– start-page: 448
  year: 2015
  ident: 10.1016/j.eswa.2023.120495_b0630
  article-title: Batch normalization: Accelerating deep network training by reducing internal covariate shift
– volume: 47
  start-page: 253
  year: 2013
  ident: 10.1016/j.eswa.2023.120495_b0145
  article-title: The arcade learning environment: An evaluation platform for general agents
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.3912
– volume: 27
  start-page: 1378
  issue: 9
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0250
  article-title: AgentGraph: Toward universal dialogue management with structured deep reinforcement learning
  publication-title: IEEE/ACM Transactions on Audio Speech and Language Processing
  doi: 10.1109/TASLP.2019.2919872
– volume: 50
  start-page: 3826
  issue: 9
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1115
  article-title: Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications
  publication-title: IEEE Transactions On Cybernetics
  doi: 10.1109/TCYB.2020.2977374
– volume: 54
  start-page: 1
  issue: 5
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1175
  article-title: Hierarchical reinforcement learning: A comprehensive survey
  publication-title: ACM Computing Survey
  doi: 10.1145/3453160
– ident: 10.1016/j.eswa.2023.120495_b1030
– ident: 10.1016/j.eswa.2023.120495_b0490
– start-page: 1
  year: 2015
  ident: 10.1016/j.eswa.2023.120495_b1090
  article-title: Language understanding for text-based games using deep reinforcement learning
– volume: 8
  start-page: 8557
  issue: 10
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0380
  article-title: Distributed deep reinforcement learning for renewable energy accommodation assessment with communication uncertainty in internet of energy
  publication-title: IEEE Internet Of Things Journal
  doi: 10.1109/JIOT.2020.3046622
– ident: 10.1016/j.eswa.2023.120495_b0405
– volume: 6
  start-page: 5223
  issue: 3
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1310
  article-title: Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning
  publication-title: IEEE Robotics and Automation Letters
  doi: 10.1109/LRA.2021.3071954
– ident: 10.1016/j.eswa.2023.120495_b0835
– volume: 183
  start-page: 107575
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1505
  article-title: Efficient flow migration for NFV with Graph-aware deep reinforcement learning
  publication-title: Computer Networks
  doi: 10.1016/j.comnet.2020.107575
– start-page: 1
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0465
  article-title: RLAS-BIABC: A reinforcement learning-based answer selection using the bert model boosted by an improved ABC algorithm
  publication-title: Computational Intelligence and Neuroscience
  doi: 10.1155/2022/7839840
– volume: 468
  issue: 2022
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1775
  article-title: Deep neural networks based temporal-difference methods for high-dimensional parabolic partial differential equations
  publication-title: Journal of Computational Physics
– start-page: 1
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b1650
  article-title: Sample efficient actor-critic with experience replay
– volume: 8
  start-page: 171058
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0035
  article-title: Reinforcement learning interpretation methods: A survey
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.3023394
– ident: 10.1016/j.eswa.2023.120495_b1035
– ident: 10.1016/j.eswa.2023.120495_b1190
  doi: 10.1609/aaai.v24i1.7727
– ident: 10.1016/j.eswa.2023.120495_b1185
– year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1620
  article-title: Programmatically interpretable reinforcement learning
– volume: 22
  start-page: 123
  year: 1996
  ident: 10.1016/j.eswa.2023.120495_b1455
  article-title: Reinforcement learning with replacing eligibility traces
  publication-title: Machine Learning
  doi: 10.1023/A:1018012322525
– ident: 10.1016/j.eswa.2023.120495_b0980
  doi: 10.1109/IROS.2007.4399095
– start-page: 1
  year: 2016
  ident: 10.1016/j.eswa.2023.120495_b1170
  article-title: Actor-mimic deep multitask and transfer reinforcement learning
– ident: 10.1016/j.eswa.2023.120495_b0495
– volume: 6
  start-page: 236
  issue: 1
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0255
  article-title: Parallel planning: A new motion planning framework for autonomous driving
  publication-title: IEEE/CAA Journal of Automatica Sinica
  doi: 10.1109/JAS.2018.7511186
– ident: 10.1016/j.eswa.2023.120495_b0520
– ident: 10.1016/j.eswa.2023.120495_b0340
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: 10.1016/j.eswa.2023.120495_b1040
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– ident: 10.1016/j.eswa.2023.120495_b0675
– ident: 10.1016/j.eswa.2023.120495_b0945
– volume: 104
  start-page: 104630
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0305
  article-title: Vision-based robust control framework based on deep reinforcement learning applied to autonomous ground vehicles
  publication-title: Control Engineering Practice
  doi: 10.1016/j.conengprac.2020.104630
– volume: 78
  start-page: 236
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1165
  article-title: Reinforcement learning based compensation methods for robot manipulators
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2018.11.006
– volume: 588
  start-page: 604
  issue: 7839
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1350
  article-title: Mastering Atari, Go, chess and shogi by planning with a learned model
  publication-title: Nature
  doi: 10.1038/s41586-020-03051-4
– volume: 388
  start-page: 12
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1725
  article-title: Integration of an actor-critic model and generative adversarial networks for a Chinese calligraphy robot
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2020.01.043
– volume: 55
  start-page: 1
  issue: 7
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0020
  article-title: Reinforcement learning based recommender systems: A survey
  publication-title: ACM Computing Surveys
  doi: 10.1145/3543846
– ident: 10.1016/j.eswa.2023.120495_b0515
– start-page: 663
  year: 2000
  ident: 10.1016/j.eswa.2023.120495_b1105
  article-title: Algorithms for inverse reinforcement learning
– ident: 10.1016/j.eswa.2023.120495_b1415
– volume: 54
  start-page: 3215
  issue: 12
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0335
  article-title: A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications
  publication-title: Artificial Intelligence Review
– ident: 10.1016/j.eswa.2023.120495_b0175
– ident: 10.1016/j.eswa.2023.120495_b0450
– volume: 134
  start-page: 1
  issue: 1
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0990
  article-title: Reinforcement learning for combinatorial optimization: A survey
  publication-title: Computers & Operations Research
– ident: 10.1016/j.eswa.2023.120495_b1045
  doi: 10.1561/9781638280576
– ident: 10.1016/j.eswa.2023.120495_b0940
  doi: 10.1016/j.engappai.2022.104848
– ident: 10.1016/j.eswa.2023.120495_b0280
– volume: 15
  start-page: 210
  year: 1970
  ident: 10.1016/j.eswa.2023.120495_b0430
  article-title: Learning control systems—Review and outlook
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/TAC.1970.1099405
– year: 2016
  ident: 10.1016/j.eswa.2023.120495_b1355
– volume: 7
  start-page: 6638
  issue: 3
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0195
  article-title: VesNet-RL: Simulation-based reinforcement learning for real-world US probe navigation
  publication-title: IEEE Robotics and Automation Letters
  doi: 10.1109/LRA.2022.3176112
– ident: 10.1016/j.eswa.2023.120495_b0885
– ident: 10.1016/j.eswa.2023.120495_b0455
– ident: 10.1016/j.eswa.2023.120495_b0730
– volume: 86
  start-page: 153
  issue: 2
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b1210
  article-title: Survey of model-based reinforcement learning: Applications on Robotics
  publication-title: Journal of Intelligent and Robotic Systems: Theory and Applications
  doi: 10.1007/s10846-017-0468-y
– ident: 10.1016/j.eswa.2023.120495_b0025
– ident: 10.1016/j.eswa.2023.120495_b1070
– volume: 81
  start-page: 15395
  issue: 11
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0715
  article-title: Deep reinforcement learning approach for manuscripts image classification and retrieval
  publication-title: Multimedia Tools and Applications
  doi: 10.1007/s11042-022-12572-1
– volume: 55
  start-page: 2733
  issue: 4
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0820
  article-title: Deep reinforcement learning in computer vision: A comprehensive survey
  publication-title: Artificial Intelligence Review
  doi: 10.1007/s10462-021-10061-9
– start-page: 1
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1005
  article-title: Guided meta-policy search
– volume: 9
  start-page: 3259
  issue: 4
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b0285
  article-title: Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control
  publication-title: IEEE Transactions on Smart Grid
  doi: 10.1109/TSG.2016.2629450
– volume: 71
  start-page: 2511
  issue: 3
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1065
  article-title: Bio-inspired collision avoidance in swarm systems via deep reinforcement learning
  publication-title: IEEE Transactions on Vehicular Technology
  doi: 10.1109/TVT.2022.3145346
– volume: 33
  start-page: 2045
  issue: 5
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0800
  article-title: Deep reinforcement learning with modulated Hebbian plus Q-network architecture
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
  doi: 10.1109/TNNLS.2021.3110281
– ident: 10.1016/j.eswa.2023.120495_b1305
– volume: 34
  start-page: 286
  year: 1977
  ident: 10.1016/j.eswa.2023.120495_b1710
  article-title: An adaptive optimal controller for discrete-time markov environments
  publication-title: Information and Control
  doi: 10.1016/S0019-9958(77)90354-0
– volume: 27
  start-page: 1011
  issue: 2
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0235
  article-title: A learning-based vehicle trajectory-tracking approach for autonomous vehicles with lidar failure under various lighting conditions
  publication-title: IEEE/ASME Transactions on Mechatronics
  doi: 10.1109/TMECH.2021.3077388
– volume: 21
  start-page: 682
  issue: 4
  year: 2008
  ident: 10.1016/j.eswa.2023.120495_b1205
  article-title: Reinforcement learning of motor skills with policy gradients
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2008.02.003
– volume: 23
  start-page: 740
  issue: 2
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0060
  article-title: Survey of deep reinforcement learning for motion planning of autonomous vehicles
  publication-title: IEEE Transactions On Intelligent Transportation Systems
  doi: 10.1109/TITS.2020.3024655
– year: 1960
  ident: 10.1016/j.eswa.2023.120495_b0605
– volume: 50
  start-page: 119
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0090
  article-title: From inverse optimal control to inverse reinforcement learning: A historical review
  publication-title: Annual Reviews in Control
  doi: 10.1016/j.arcontrol.2020.06.001
– ident: 10.1016/j.eswa.2023.120495_b1135
– start-page: 3223
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b0585
  article-title: Deep Q-learning from demonstrations
– volume: 97
  start-page: 5331
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1250
  publication-title: Efficient off-policy meta-reinforcement learning via probabilistic context variables
– start-page: 3207
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b0575
  article-title: Deep reinforcement learning that matters
– volume: 127
  start-page: 282
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1395
  article-title: Reinforcement learning –Overview of recent progress and implications for process control
  publication-title: Computers and Chemical Engineering
  doi: 10.1016/j.compchemeng.2019.05.029
– volume: 38
  start-page: 58
  issue: 3
  year: 1995
  ident: 10.1016/j.eswa.2023.120495_b1580
  article-title: Temporal difference learning and TD-Gammon
  publication-title: Communication of the ACM
  doi: 10.1145/203330.203343
– volume: 4
  start-page: 132
  issue: 1
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1785
  article-title: Energy-efficient scheduling for real-time systems based on deep Q-learning model
  publication-title: IEEE Transactions on Sustainable Computing
  doi: 10.1109/TSUSC.2017.2743704
– ident: 10.1016/j.eswa.2023.120495_b0920
– volume: 22
  start-page: 1
  issue: 6
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0375
  article-title: A novel reinforcement learning collision avoidance algorithm for usvs based on maneuvering characteristics and COLREGs
  publication-title: Sensors
  doi: 10.3390/s22062099
– volume: 40
  start-page: 935
  issue: 4
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0880
  article-title: GNN-based hierarchical deep reinforcement learning for NFV-oriented online resource orchestration in elastic optical DCIs
  publication-title: Journal of Lightwave Technology
  doi: 10.1109/JLT.2021.3125974
– volume: 40
  start-page: 75
  year: 2023
  ident: 10.1016/j.eswa.2023.120495_b0875
  article-title: Deep reinforcement learning in smart manufacturing: A review and prospects
  publication-title: CIRP Journal of Manufacturing Science and Technology
  doi: 10.1016/j.cirpj.2022.11.003
– volume: 13
  start-page: 2935
  issue: 4
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0265
  article-title: Reinforcement learning for selective key applications in power systems: Recent advances and future challenges
  publication-title: IEEE Transactions On Smart Grid
  doi: 10.1109/TSG.2022.3154718
– volume: 538
  start-page: 142
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1715
  article-title: Adaptive stock trading strategies with deep reinforcement learning methods
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2020.05.066
– ident: 10.1016/j.eswa.2023.120495_b0680
– volume: 8
  start-page: 208992
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0075
  article-title: Reinforcement learning techniques for optimal power control in grid-connected microgrids: A comprehensive review
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.3038735
– volume: 8
  start-page: 293
  year: 1992
  ident: 10.1016/j.eswa.2023.120495_b0895
  article-title: Self-improving reactive agents based on reinforcement learning, planning and teaching
  publication-title: Machine Learning
  doi: 10.1023/A:1022628806385
– ident: 10.1016/j.eswa.2023.120495_b0150
  doi: 10.1609/aaai.v26i1.8321
– ident: 10.1016/j.eswa.2023.120495_b1080
  doi: 10.1109/ICRA.2018.8463189
– ident: 10.1016/j.eswa.2023.120495_b0550
  doi: 10.1609/aaai.v30i1.10295
– volume: 95
  start-page: 103869
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0750
  article-title: Reinforcement learning for quadrupedal locomotion with design of continual–hierarchical curriculum
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2020.103869
– volume: 4
  start-page: 1107
  issue: 6
  year: 2003
  ident: 10.1016/j.eswa.2023.120495_b0805
  article-title: Least-squares policy iteration
  publication-title: Journal of Machine Learning Research
– volume: 7
  start-page: 617
  issue: 2
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0925
  article-title: Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system
  publication-title: IEEE/CAA Journal of Automatica Sinica
  doi: 10.1109/JAS.2020.1003072
– ident: 10.1016/j.eswa.2023.120495_b1300
– volume: 596
  start-page: 583
  issue: 7873
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0670
  article-title: Highly accurate protein structure prediction with AlphaFold
  publication-title: Nature
  doi: 10.1038/s41586-021-03819-2
– ident: 10.1016/j.eswa.2023.120495_b0840
– ident: 10.1016/j.eswa.2023.120495_b0135
– volume: 23
  start-page: 4909
  issue: 6
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0725
  article-title: Deep reinforcement learning for autonomous driving: A survey
  publication-title: IEEE Transactions On Intelligent Transportation Systems
  doi: 10.1109/TITS.2021.3054625
– ident: 10.1016/j.eswa.2023.120495_b1020
– volume: 21
  start-page: 3133
  issue: 4
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0950
  article-title: Applications of deep reinforcement learning in communications and networking: A survey
  publication-title: IEEE Communications Surveys and Tutorials
  doi: 10.1109/COMST.2019.2916583
– ident: 10.1016/j.eswa.2023.120495_b0365
– ident: 10.1016/j.eswa.2023.120495_b0640
– year: 1972
  ident: 10.1016/j.eswa.2023.120495_b0170
– ident: 10.1016/j.eswa.2023.120495_b0860
  doi: 10.1145/1772690.1772758
– volume: 49
  start-page: 337
  issue: 4
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0865
  article-title: Human-centered reinforcement learning: A survey
  publication-title: IEEE Transactions on Human-Machine Systems
  doi: 10.1109/THMS.2019.2912447
– start-page: 312
  year: 1996
  ident: 10.1016/j.eswa.2023.120495_b0530
  article-title: Adapting arbitrary normal mutation distributions in evolution strategies: The covariancematrix adaptation
– start-page: 3986
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1160
  article-title: Reinforcement learning with function-valued action spaces for partial differential equation control
– volume: 9
  start-page: 1735
  issue: 8
  year: 1997
  ident: 10.1016/j.eswa.2023.120495_b0590
  article-title: Long Short-Term Memory
  publication-title: Neural Computation
  doi: 10.1162/neco.1997.9.8.1735
– year: 1994
  ident: 10.1016/j.eswa.2023.120495_b1230
– ident: 10.1016/j.eswa.2023.120495_b1295
– volume: 45
  start-page: 2471
  issue: 11
  year: 2009
  ident: 10.1016/j.eswa.2023.120495_b0190
  article-title: Natural actor-critic algorithms
  publication-title: Automatica
  doi: 10.1016/j.automatica.2009.07.008
– volume: 378
  start-page: 1092
  issue: 6624
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0855
  article-title: Competition-level code generation with AlphaCode
  publication-title: Science
  doi: 10.1126/science.abq1158
– volume: 2
  start-page: 137
  year: 1968
  ident: 10.1016/j.eswa.2023.120495_b1010
  article-title: BOXES, An experiment in adaptive control
  publication-title: Machine Intelligence
– ident: 10.1016/j.eswa.2023.120495_b1225
– volume: 20
  start-page: 61
  issue: 1
  year: 2009
  ident: 10.1016/j.eswa.2023.120495_b1320
  article-title: The graph neural network model
  publication-title: IEEE Transactions on Neural Networks
  doi: 10.1109/TNN.2008.2005605
– year: 1996
  ident: 10.1016/j.eswa.2023.120495_b0185
– ident: 10.1016/j.eswa.2023.120495_b0845
– ident: 10.1016/j.eswa.2023.120495_b1655
– volume: 521
  start-page: 436
  issue: 7553
  year: 2015
  ident: 10.1016/j.eswa.2023.120495_b0825
  article-title: Deep learning
  publication-title: Nature
  doi: 10.1038/nature14539
– ident: 10.1016/j.eswa.2023.120495_b0415
– start-page: 3040
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0655
  article-title: Social influence as intrinsic motivation for multi-agent deep reinforcement learning
– start-page: 1054
  year: 2016
  ident: 10.1016/j.eswa.2023.120495_b1060
  article-title: Safe and efficient off-policy reinforcement learning
– volume: 11
  start-page: 11
  year: 1997
  ident: 10.1016/j.eswa.2023.120495_b0080
  article-title: Locally Weighted Learning
  publication-title: Artificial Intelligence Review
  doi: 10.1023/A:1006559212014
– ident: 10.1016/j.eswa.2023.120495_b1420
– ident: 10.1016/j.eswa.2023.120495_b0100
– year: 1982
  ident: 10.1016/j.eswa.2023.120495_b0745
– ident: 10.1016/j.eswa.2023.120495_b1575
– volume: 8
  start-page: 341
  year: 1992
  ident: 10.1016/j.eswa.2023.120495_b0300
  article-title: The convergence of TD(λ) for general λ
  publication-title: Machine Learning
  doi: 10.1023/A:1022632907294
– volume: 139
  start-page: 1
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1120
  article-title: A review On reinforcement learning: Introduction and applications in industrial process control
  publication-title: Computers and Chemical Engineering
  doi: 10.1016/j.compchemeng.2020.106886
– ident: 10.1016/j.eswa.2023.120495_b0290
– start-page: 1
  year: 2010
  ident: 10.1016/j.eswa.2023.120495_b0600
  article-title: Multiobjective reinforcement learning for traffic signal control using vehicular ad hoc network
  publication-title: EURASIP Journal on Advances in Signal Processing
– ident: 10.1016/j.eswa.2023.120495_b0525
– ident: 10.1016/j.eswa.2023.120495_b0790
– year: 1998
  ident: 10.1016/j.eswa.2023.120495_b1540
– ident: 10.1016/j.eswa.2023.120495_b0955
– volume: 22
  start-page: 1
  issue: 4
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0535
  article-title: Entanglement classification via neural network quantum states
  publication-title: New Journal of Physics
  doi: 10.1088/1367-2630/ab783d
– volume: 6
  start-page: 355
  issue: 4
  year: 2014
  ident: 10.1016/j.eswa.2023.120495_b0555
  article-title: A neuroevolution approach to general atari game playing
  publication-title: IEEE Transactions on Computational Intelligence and AI in Games
  doi: 10.1109/TCIAIG.2013.2294713
– ident: 10.1016/j.eswa.2023.120495_b0445
– ident: 10.1016/j.eswa.2023.120495_b0720
– ident: 10.1016/j.eswa.2023.120495_b1485
  doi: 10.1145/1143844.1143955
– ident: 10.1016/j.eswa.2023.120495_b1680
– ident: 10.1016/j.eswa.2023.120495_b0085
– start-page: 5739
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1755
  article-title: Towards sample efficient reinforcement learning
– volume: 41
  start-page: 256
  issue: 314
  year: 1950
  ident: 10.1016/j.eswa.2023.120495_b1385
  article-title: XXII. Programming a computer for playing chess
  publication-title: Philosophical Magazine and Journal of Science
  doi: 10.1080/14786445008521796
– volume: 199
  start-page: 1
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1125
  article-title: Reinforcement learning in urban network traffic signal control: A systematic literature review
  publication-title: Expert Systems With Applications
  doi: 10.1016/j.eswa.2022.116830
– volume: 40
  start-page: 1721
  year: 2013
  ident: 10.1016/j.eswa.2023.120495_b1015
  article-title: Neural network reinforcement learning for visual control of robot manipulator
  publication-title: Expert Systems With Applications
  doi: 10.1016/j.eswa.2012.09.010
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  ident: 10.1016/j.eswa.2023.120495_b1405
  article-title: Mastering the game of Go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
– start-page: 5872
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1795
  article-title: Fully decentralized multi-agent reinforcement learning with networked agents
– start-page: 11
  year: 1975
  ident: 10.1016/j.eswa.2023.120495_b0740
  article-title: A comparison of natural and artificial intelligence
  publication-title: ACM SIGART Bulletin
  doi: 10.1145/1045236.1045237
– ident: 10.1016/j.eswa.2023.120495_b1340
– ident: 10.1016/j.eswa.2023.120495_b1495
– volume: v1.3.5
  start-page: 2013
  year: 2013
  ident: 10.1016/j.eswa.2023.120495_b1730
  publication-title: TORCS, The open racing car simulator
– ident: 10.1016/j.eswa.2023.120495_b1770
– volume: vol. 1
  year: 2005
  ident: 10.1016/j.eswa.2023.120495_b0180
– volume: 7
  issue: 108
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1245
  article-title: Autonomous reinforcement learning agent for chemical vapor deposition synthesis of quantum materials
  publication-title: npj Computational Materials
– volume: 13
  start-page: 3041
  year: 2012
  ident: 10.1016/j.eswa.2023.120495_b0815
  article-title: Finite-sample analysis of least-squares policy iteration
  publication-title: Journal of Machine Learning Research
– volume: 145
  start-page: 271
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1490
  article-title: Reinforcement learning and its connections with neuroscience and psychology
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2021.10.003
– year: 1927
  ident: 10.1016/j.eswa.2023.120495_b1180
– volume: 10
  start-page: 390
  year: 1965
  ident: 10.1016/j.eswa.2023.120495_b1645
  article-title: A heuristic approach to reinforcement learning control systems
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/TAC.1965.1098193
– volume: 112
  start-page: 181
  issue: 1999
  year: 1999
  ident: 10.1016/j.eswa.2023.120495_b1555
  article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
  publication-title: Artificial Intelligence
  doi: 10.1016/S0004-3702(99)00052-1
– ident: 10.1016/j.eswa.2023.120495_b0140
– ident: 10.1016/j.eswa.2023.120495_b0440
– ident: 10.1016/j.eswa.2023.120495_b0010
– ident: 10.1016/j.eswa.2023.120495_b0765
– volume: 3
  start-page: 210
  issue: 3
  year: 1959
  ident: 10.1016/j.eswa.2023.120495_b1315
  article-title: Some studies in machine learning using the game of Chekers
  publication-title: IBM Journal of Research and Development
  doi: 10.1147/rd.33.0210
– ident: 10.1016/j.eswa.2023.120495_b1280
  doi: 10.1007/11564096_32
– volume: 555
  start-page: 604
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1380
  article-title: Planning chemical syntheses with deep neural networks and symbolic AI
  publication-title: Nature
  doi: 10.1038/nature25978
– volume: 106
  start-page: 104451
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1750
  article-title: Quantum deep reinforcement learning for rotor side converter control of double-fed induction generator-based wind turbines
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2021.104451
– ident: 10.1016/j.eswa.2023.120495_b0595
– ident: 10.1016/j.eswa.2023.120495_b0870
– volume: 31
  start-page: 1573
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0915
  article-title: Video summarization through reinforcement with a 3D spatio-temporal U-net
  publication-title: IEEE Transactions on Image Processing
  doi: 10.1109/TIP.2022.3143699
– volume: 55
  start-page: 945
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1445
  article-title: Reinforcement learning in robotic applications: A comprehensive survey
  publication-title: Artificial Intelligence Review
  doi: 10.1007/s10462-021-09997-9
– volume: 46
  start-page: 8
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b0220
  article-title: Reinforcement learning for control: Performance, stability, and deep approximators
  publication-title: Annual Reviews in Control
  doi: 10.1016/j.arcontrol.2018.09.005
– start-page: 4246
  year: 2016
  ident: 10.1016/j.eswa.2023.120495_b0665
  article-title: The malmo platform for artificial intelligence experimentation
– volume: 4
  start-page: 217
  issue: 3
  year: 1981
  ident: 10.1016/j.eswa.2023.120495_b1530
  article-title: An adaptive network that constructs and uses an internal model of its world
  publication-title: Cognition and Brain Theory
– volume: 11
  start-page: 1563
  issue: 4
  year: 2010
  ident: 10.1016/j.eswa.2023.120495_b0645
  article-title: Near-optimal regret bounds for reinforcement learning
  publication-title: Journal of Machine Learning Research
– start-page: 216
  year: 1990
  ident: 10.1016/j.eswa.2023.120495_b1525
  article-title: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
– ident: 10.1016/j.eswa.2023.120495_b1560
  doi: 10.1016/j.engappai.2021.104366
– ident: 10.1016/j.eswa.2023.120495_b1640
– ident: 10.1016/j.eswa.2023.120495_b1365
– volume: 55
  start-page: 589
  issue: 10
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b0985
  article-title: Q-RTS : A real-time swarm intelligence based on multi-agent Q-learning
  publication-title: Electronics Letters
  doi: 10.1049/el.2019.0244
– volume: 575
  start-page: 350
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1625
  article-title: Grandmaster level in StarCraft II using multi-agent reinforcement learning
  publication-title: Nature
  doi: 10.1038/s41586-019-1724-z
– volume: 6
  start-page: S191
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0045
  article-title: Introduction to deep learning
  publication-title: MIT Course Number
– start-page: 1
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0325
  article-title: Target-value-competition-based multi-agent deep reinforcement learning algorithm for distributed nonconvex economic dispatch
  publication-title: IEEE Transactions on power systems
– ident: 10.1016/j.eswa.2023.120495_b0475
– start-page: 1
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1220
  article-title: Temporal difference models: Model-free deep RL for model-based control
– start-page: 1725
  year: 2014
  ident: 10.1016/j.eswa.2023.120495_b0700
  article-title: Large-scale video classification with convolutional neural networks
– volume: 18
  start-page: 2041
  issue: 3
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1240
  article-title: Modeling, detecting, and mitigating threats against industrial healthcare systems: A combined software defined networking and reinforcement learning approach
  publication-title: IEEE Transactions on Industrial Informatics
  doi: 10.1109/TII.2021.3093905
– ident: 10.1016/j.eswa.2023.120495_b1480
– ident: 10.1016/j.eswa.2023.120495_b0355
  doi: 10.1145/1102351.1102377
– ident: 10.1016/j.eswa.2023.120495_b0500
– start-page: 1
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0295
  article-title: Distributed actor-critic algorithms for multiagent reinforcement learning over directed graphs
  publication-title: IEEE Transactions On Neural Networks and Learning Systems
– ident: 10.1016/j.eswa.2023.120495_b0850
– start-page: 26
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b0070
  article-title: Deep reinforcement learning: A brief survey
  publication-title: IEEE Signal Processing Magazine
  doi: 10.1109/MSP.2017.2743240
– volume: 602
  start-page: 328
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1465
  article-title: AdaBoost maximum entropy deep inverse reinforcement learning with truncated gradient
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2022.04.017
– ident: 10.1016/j.eswa.2023.120495_b0930
– volume: 5
  start-page: 27091
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b1110
  article-title: System design perspective for human-level agents using deep reinforcement learning: A survey
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2017.2777827
– volume: 57
  start-page: 469
  issue: 5
  year: 2009
  ident: 10.1016/j.eswa.2023.120495_b0065
  article-title: A survey of robot learning from demonstration
  publication-title: Robotics and Autonomous Systems
  doi: 10.1016/j.robot.2008.10.024
– ident: 10.1016/j.eswa.2023.120495_b1290
  doi: 10.1007/978-3-319-24574-4_28
– volume: 5
  start-page: 1143
  issue: 2
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0040
  article-title: Learning robust control policies for end-to-end autonomous driving from data-driven simulation
  publication-title: IEEE Robotics and Automation Letters
  doi: 10.1109/LRA.2020.2966414
– volume: 71
  start-page: 1180
  year: 2008
  ident: 10.1016/j.eswa.2023.120495_b1200
  article-title: Natural actor-critic
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2007.11.026
– volume: 84
  start-page: 109
  issue: 1–2
  year: 2011
  ident: 10.1016/j.eswa.2023.120495_b1400
  article-title: Informing sequential clinical decision-making through reinforcement learning: An empirical study
  publication-title: Machine Learning
  doi: 10.1007/s10994-010-5229-0
– ident: 10.1016/j.eswa.2023.120495_b0105
– ident: 10.1016/j.eswa.2023.120495_b0050
  doi: 10.1109/JPROC.2011.2109671
– ident: 10.1016/j.eswa.2023.120495_b1150
– volume: 1
  start-page: 228
  issue: 3
  year: 1958
  ident: 10.1016/j.eswa.2023.120495_b0165
  article-title: Dynamic programming and stochastic control processes
  publication-title: Information and Control
  doi: 10.1016/S0019-9958(58)80003-0
– ident: 10.1016/j.eswa.2023.120495_b1760
– ident: 10.1016/j.eswa.2023.120495_b0810
– volume: 45
  start-page: 2673
  issue: 11
  year: 1997
  ident: 10.1016/j.eswa.2023.120495_b1375
  article-title: Bidirectional recurrent neural networks
  publication-title: IEEE Transactions on Signal Processing
  doi: 10.1109/78.650093
– volume: 21
  start-page: 1
  issue: 4
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0055
  article-title: Reinforcement learning-based complete area coverage path planning for a modified htrihex robot
  publication-title: Sensors
  doi: 10.3390/s21041067
– ident: 10.1016/j.eswa.2023.120495_b0960
  doi: 10.1007/978-3-642-33492-4_6
– ident: 10.1016/j.eswa.2023.120495_b1440
– ident: 10.1016/j.eswa.2023.120495_b0470
– volume: 49
  start-page: 161
  year: 2002
  ident: 10.1016/j.eswa.2023.120495_b1145
  article-title: Kernel-based reinforcement learning
  publication-title: Machine Learning
  doi: 10.1023/A:1017928328829
– volume: 213
  start-page: 1
  year: 2023
  ident: 10.1016/j.eswa.2023.120495_b0905
  article-title: REDRL: A review-enhanced deep reinforcement learning model for interactive recommendation
  publication-title: Expert Systems With Applications
  doi: 10.1016/j.eswa.2022.118926
– volume: 243
  start-page: 1
  issue: 108483
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1735
  article-title: FusionSum: Abstractive summarization with sentence fusion and cooperative reinforcement learning
  publication-title: Knowledge-Based Systems
– volume: 13
  start-page: 103
  issue: 1
  year: 1993
  ident: 10.1016/j.eswa.2023.120495_b1055
  article-title: Prioritized sweeping: Reinforcement learning with less data and less time
  publication-title: Machine Learning
  doi: 10.1023/A:1022635613229
– ident: 10.1016/j.eswa.2023.120495_b0650
– volume: 21
  start-page: 363
  year: 2006
  ident: 10.1016/j.eswa.2023.120495_b1100
  article-title: Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX
  publication-title: Springer Tracts in Advanced Robotics
  doi: 10.1007/11552246_35
– volume: 55
  start-page: 895
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b0480
  article-title: Multi-agent deep reinforcement learning: A survey
  publication-title: Artificial Intelligence Review
  doi: 10.1007/s10462-021-09996-w
– ident: 10.1016/j.eswa.2023.120495_b1260
– ident: 10.1016/j.eswa.2023.120495_b1270
  doi: 10.7551/mitpress/9816.003.0050
– volume: 9
  start-page: 72661
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0705
  article-title: Reinforcing synthetic data for meticulous survival prediction of patients suffering from left ventricular systolic dysfunction
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2021.3080617
– ident: 10.1016/j.eswa.2023.120495_b0110
– ident: 10.1016/j.eswa.2023.120495_b1155
– ident: 10.1016/j.eswa.2023.120495_b1430
– ident: 10.1016/j.eswa.2023.120495_b0095
– ident: 10.1016/j.eswa.2023.120495_b0370
– year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1545
– ident: 10.1016/j.eswa.2023.120495_b1690
– volume: 3
  start-page: 72
  year: 1978
  ident: 10.1016/j.eswa.2023.120495_b1520
  article-title: Single channel theory: A neuronal theory of learning
  publication-title: Brain Theory Newsletter
– volume: 18
  start-page: 2936
  issue: 12
  year: 2006
  ident: 10.1016/j.eswa.2023.120495_b1565
  article-title: Learning tetris using the noisy cross-entropy method
  publication-title: Neural Computation
  doi: 10.1162/neco.2006.18.12.2936
– volume: 5
  start-page: 297
  year: 1966
  ident: 10.1016/j.eswa.2023.120495_b1000
  article-title: A survey of learning control systems
  publication-title: ISA Transactions
– start-page: 650
  year: 2007
  ident: 10.1016/j.eswa.2023.120495_b0685
  article-title: Batch reinforcement learning in a complex domain
– ident: 10.1016/j.eswa.2023.120495_b0970
  doi: 10.1109/CDC.1998.760738
– volume: 8
  start-page: 208016
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b1255
  article-title: Deep reinforcement learning for traffic signal control: A review
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.3034141
– volume: 22
  start-page: 7208
  issue: 11
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1800
  article-title: A hybrid of deep reinforcement learning and local search for the vehicle routing problems
  publication-title: IEEE Transactions on Intelligent Transportation Systems
  doi: 10.1109/TITS.2020.3003163
– volume: 59
  start-page: 3166
  issue: 7
  year: 2019
  ident: 10.1016/j.eswa.2023.120495_b1475
  article-title: Deep reinforcement learning for multiparameter optimization in de novo drug design
  publication-title: Journal of Chemical Information and Modeling
  doi: 10.1021/acs.jcim.9b00325
– volume: 26
  start-page: 3757
  issue: 8
  year: 2022
  ident: 10.1016/j.eswa.2023.120495_b1635
  article-title: An integrated network embedding with reinforcement learning for explainable recommendation
  publication-title: Soft Computing - A Fusion of Foundations, Methodologies and Applications
– ident: 10.1016/j.eswa.2023.120495_b1330
– volume: 73
  start-page: 1
  issue: 102193
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1815
  article-title: Deep reinforcement learning in medical imaging: A literature review
  publication-title: Medical Image Analysis
– volume: 15
  start-page: 319
  year: 2001
  ident: 10.1016/j.eswa.2023.120495_b0130
  article-title: Infinite-horizon policy-gradient estimation
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.806
– volume: 70
  start-page: 377
  issue: 1
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b0625
  article-title: Integrated process-system modelling and control through graph neural network and reinforcement learning
  publication-title: CIRP Annals
  doi: 10.1016/j.cirp.2021.04.056
– volume: 42
  start-page: 1143
  issue: 4
  year: 2003
  ident: 10.1016/j.eswa.2023.120495_b0770
  article-title: On actor-critic algorithms
  publication-title: SIAM Journal on Control and Optimization
  doi: 10.1137/S0363012901385691
– volume: 29
  start-page: 2063
  issue: 6
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b0965
  article-title: Applications of deep learning and reinforcement learning to biological data
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
  doi: 10.1109/TNNLS.2018.2790388
– ident: 10.1016/j.eswa.2023.120495_b1600
– ident: 10.1016/j.eswa.2023.120495_b0005
  doi: 10.1109/ITSC.2011.6083114
– ident: 10.1016/j.eswa.2023.120495_b0760
  doi: 10.1109/ROBOT.2004.1307456
– volume: 38
  start-page: 156
  issue: 2
  year: 2008
  ident: 10.1016/j.eswa.2023.120495_b0215
  article-title: A comprehensive survey of multiagent reinforcement learning
  publication-title: IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews
  doi: 10.1109/TSMCC.2007.913919
– volume: 25
  start-page: 176
  issue: 1
  year: 2021
  ident: 10.1016/j.eswa.2023.120495_b1500
  article-title: Combining deep reinforcement learning with graph neural networks for optimal VNF placement
  publication-title: IEEE Communications Letters
  doi: 10.1109/LCOMM.2020.3025298
– ident: 10.1016/j.eswa.2023.120495_b1265
– year: 1911
  ident: 10.1016/j.eswa.2023.120495_b1590
– volume: 16
  start-page: 221
  issue: 3/4
  year: 1956
  ident: 10.1016/j.eswa.2023.120495_b0155
  article-title: A Problem in the sequential design of experiments
  publication-title: The Indian Journal of Statistics
– volume: 18
  start-page: 6070
  issue: 1
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b0270
  article-title: Risk-constrained reinforcement learning with percentile risk criteria
  publication-title: The Journal of Machine Learning Research
– ident: 10.1016/j.eswa.2023.120495_b0900
  doi: 10.1109/CEC45853.2021.9504972
– ident: 10.1016/j.eswa.2023.120495_b1095
– ident: 10.1016/j.eswa.2023.120495_b1370
– volume: 10
  start-page: 2133
  year: 2009
  ident: 10.1016/j.eswa.2023.120495_b1570
  article-title: RL-Glue: Language-independent software for reinforcement-learning experiments
  publication-title: Journal of Machine Learning Research
– start-page: 1
  year: 2020
  ident: 10.1016/j.eswa.2023.120495_b0615
  article-title: GAN-based deep distributional reinforcement learning for resource management in network slicing
– volume: 362
  start-page: 1140
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b1410
  article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
  publication-title: Science
  doi: 10.1126/science.aar6404
– ident: 10.1016/j.eswa.2023.120495_b1605
– start-page: 5998
  year: 2017
  ident: 10.1016/j.eswa.2023.120495_b1615
  article-title: Attention is all you need
– start-page: 630
  year: 2016
  ident: 10.1016/j.eswa.2023.120495_b0565
  article-title: Identity mappings in deep residual networks
– volume: 8
  start-page: 323
  issue: 4
  year: 1999
  ident: 10.1016/j.eswa.2023.120495_b1275
  article-title: Concepts and facilities of a neural reinforcement learning control architecture for technical process control
  publication-title: Neural Computing and Applications
  doi: 10.1007/s005210050038
– start-page: 1861
  year: 2018
  ident: 10.1016/j.eswa.2023.120495_b0505
  article-title: Soft actor-critic: Off-policy Maximum entropy deep reinforcement learning with a stochastic actor
– ident: 10.1016/j.eswa.2023.120495_b1695
– ident: 10.1016/j.eswa.2023.120495_b0260
– ident: 10.1016/j.eswa.2023.120495_b1435
– ident: 10.1016/j.eswa.2023.120495_b0690
– start-page: 2863
  year: 2015
  ident: 10.1016/j.eswa.2023.120495_b1130
  article-title: Action-conditional video prediction using deep networks in Atari games
SSID ssj0017007
Score 2.7444592
SecondaryResourceType review_article
Snippet •RL can be used to solve problems involving sequential decision-making.•RL is based on trial-and-error learning through rewards and punishments.•The ultimate...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 120495
SubjectTerms Deep Reinforcement Learning (DRL)
Function approximation
Reinforcement learning
Stochastic optimal control
Title Reinforcement learning algorithms: A brief survey
URI https://dx.doi.org/10.1016/j.eswa.2023.120495
Volume 231
WOSCitedRecordID wos001046203600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELbKwoELbXmIV6scuK2CYjuOY26rCvo4oKoCaW9R4jgsryzahNe_Z_zK7lKKyoGLFUWxE3lGk5nxzPchtJeqMmYcwlQqYxnGlRIhePVFqFjCSwHeHM5TQzbBj4_T4VD8dpytjaET4HWdPjyIm3cVNdwDYevW2TeIu1sUbsA1CB1GEDuM_yX4P8qAoUqT9_OsEGf9_OpsPDlvR9eNbUYvIEau-s3t5G7-ZNdgH7cO4dn3vs2ccncpmVF--WiTss1IwyKZYu3O0moyI1Mo8F13ZOkE_d-FBLbMbDyym-BzD4R6zMOZJCIPY2x5drw9Jc6sW4uICcQg7EVjbfMGF_uqudcIUITuTx-eR8Z-9sfq6gh9idpFptfI9BqZXWMBLRLORNpDi4Ofh8Nf3ckSj2wLvf9y10hla_6ef8nLzsqMA3LyCa24yCEYWIl_Rh9UvYo-elaOwBnpNYTnFCDwChBMFeAgGARG_IEV_zo6PTo8-fYjdMQYoaRR1IZKlClPK5ZTSkSJE_Cyo1hhhSOREgkeJ5W0xKIqpCo55XEFXghjJpRnKqkI3UC9elyrTRSoqCqIBoUsYhyrhOS4TCSVXEglioTRLYT9DmTSocZr8pKr7N97v4X63Zwbi5ny6tPMb2zmvD7rzWWgJ6_M237TW3bQ8lSBd1GvndyqL2hJ3rXnzeSrU5InxPN2tQ
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reinforcement+learning+algorithms%3A+A+brief+survey&rft.jtitle=Expert+systems+with+applications&rft.au=Shakya%2C+Ashish+Kumar&rft.au=Pillai%2C+Gopinatha&rft.au=Chakrabarty%2C+Sohom&rft.date=2023-11-30&rft.issn=0957-4174&rft.volume=231&rft.spage=120495&rft_id=info:doi/10.1016%2Fj.eswa.2023.120495&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2023_120495
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon