A survey and critique of multiagent deep reinforcement learning

Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Autonomous agents and multi-agent systems Jg. 33; H. 6; S. 750 - 797
Hauptverfasser: Hernandez-Leal, Pablo, Kartal, Bilal, Taylor, Matthew E.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.11.2019
Springer Nature B.V
Schlagworte:
ISSN:1387-2532, 1573-7454
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.
AbstractList Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.
Author Hernandez-Leal, Pablo
Taylor, Matthew E.
Kartal, Bilal
Author_xml – sequence: 1
  givenname: Pablo
  orcidid: 0000-0002-8530-6775
  surname: Hernandez-Leal
  fullname: Hernandez-Leal, Pablo
  email: pablo.hernandez@borealisai.com
  organization: Borealis AI
– sequence: 2
  givenname: Bilal
  surname: Kartal
  fullname: Kartal, Bilal
  organization: Borealis AI
– sequence: 3
  givenname: Matthew E.
  surname: Taylor
  fullname: Taylor, Matthew E.
  organization: Borealis AI
BookMark eNp9kE1LxDAQhoOs4O7qH_BU8FzNJGmTnmRZ_IIFL3oOaTpdsnTTNUmF_fd2rSB48DTD8D4zw7MgM997JOQa6C1QKu8iUFGonEKV00owyOGMzKGQPJeiELOx50rmrODsgixi3FEKJSthTu5XWRzCJx4z45vMBpfcx4BZ32b7oUvObNGnrEE8ZAGdb_tgcX8adWiCd357Sc5b00W8-qlL8v748LZ-zjevTy_r1Sa3HKqU87JSYAxiK1mFFiphG9oCSkZlo5gSlS1rWxeyKIUQVV1LSqmytobCQI2WL8nNtPcQ-vHBmPSuH4IfT2rGqYRSqQLGlJpSNvQxBmy1dckk1_sUjOs0UH3SpSddetSlv3XpE8r-oIfg9iYc_4f4BMUx7LcYfr_6h_oCyr9-rg
CitedBy_id crossref_primary_10_1002_rnc_70029
crossref_primary_10_1016_j_biosystems_2023_105107
crossref_primary_10_1109_TVT_2023_3338612
crossref_primary_10_3389_fcomp_2022_846440
crossref_primary_10_1016_j_sysconle_2023_105563
crossref_primary_10_1109_ACCESS_2025_3569093
crossref_primary_10_1109_TSP_2023_3334396
crossref_primary_10_3390_jmse13010020
crossref_primary_10_1109_ACCESS_2023_3340867
crossref_primary_10_1007_s42979_022_01453_x
crossref_primary_10_1109_TCYB_2023_3266448
crossref_primary_10_1109_TNSE_2022_3188670
crossref_primary_10_1007_s10458_021_09514_w
crossref_primary_10_1109_LRA_2020_2969937
crossref_primary_10_1109_JAS_2022_105506
crossref_primary_10_3390_electronics9091363
crossref_primary_10_1016_j_neucom_2022_01_025
crossref_primary_10_1109_TCCN_2023_3293018
crossref_primary_10_1186_s13677_023_00446_2
crossref_primary_10_1007_s10489_022_03821_9
crossref_primary_10_1109_TCOMM_2024_3365520
crossref_primary_10_1007_s10462_022_10299_x
crossref_primary_10_1109_TNNLS_2022_3147221
crossref_primary_10_1016_j_neunet_2023_01_046
crossref_primary_10_1007_s10458_021_09541_7
crossref_primary_10_1016_j_eswa_2020_113701
crossref_primary_10_1080_10447318_2022_2083463
crossref_primary_10_1109_TNNLS_2024_3382985
crossref_primary_10_1007_s10458_023_09633_6
crossref_primary_10_1109_TASE_2025_3563725
crossref_primary_10_1016_j_sigpro_2023_108965
crossref_primary_10_1007_s11063_024_11611_2
crossref_primary_10_1063_5_0147231
crossref_primary_10_1016_j_jai_2024_02_003
crossref_primary_10_1016_j_ijepes_2024_109863
crossref_primary_10_1007_s10462_023_10450_2
crossref_primary_10_1007_s10668_021_01836_9
crossref_primary_10_1109_ACCESS_2019_2963584
crossref_primary_10_3389_frai_2022_805823
crossref_primary_10_1371_journal_pone_0311550
crossref_primary_10_1038_s43017_023_00409_w
crossref_primary_10_1109_TNNLS_2024_3455422
crossref_primary_10_1016_j_watres_2024_121145
crossref_primary_10_1016_j_engappai_2020_104112
crossref_primary_10_1080_17480930_2024_2362579
crossref_primary_10_1017_pds_2021_17
crossref_primary_10_1007_s00521_021_06117_0
crossref_primary_10_1016_j_chaos_2025_117004
crossref_primary_10_1016_j_trc_2023_104033
crossref_primary_10_1016_j_isatra_2025_08_003
crossref_primary_10_1109_TNNLS_2022_3219814
crossref_primary_10_3390_app14041677
crossref_primary_10_1016_j_ress_2024_110118
crossref_primary_10_1108_IJMPB_03_2024_0065
crossref_primary_10_1016_j_procs_2022_09_426
crossref_primary_10_1109_TITS_2024_3411487
crossref_primary_10_1137_23M1592559
crossref_primary_10_1016_j_comcom_2023_07_006
crossref_primary_10_1177_03611981251333710
crossref_primary_10_3390_app12146953
crossref_primary_10_1109_TCNS_2021_3097306
crossref_primary_10_1007_s00607_025_01472_5
crossref_primary_10_1016_j_compchemeng_2025_109111
crossref_primary_10_3233_AIC_220128
crossref_primary_10_3390_en18123171
crossref_primary_10_1007_s13253_023_00551_4
crossref_primary_10_3390_e23111433
crossref_primary_10_1109_TITS_2025_3530463
crossref_primary_10_1016_j_rcim_2022_102324
crossref_primary_10_1016_j_apenergy_2023_122349
crossref_primary_10_1016_j_neunet_2022_05_013
crossref_primary_10_1016_j_neucom_2024_128068
crossref_primary_10_1162_artl_a_00416
crossref_primary_10_3390_wevj15100453
crossref_primary_10_1109_TCC_2021_3110965
crossref_primary_10_1007_s43154_022_00091_8
crossref_primary_10_2139_ssrn_5333312
crossref_primary_10_1109_TITS_2020_3024655
crossref_primary_10_1016_j_jmsy_2022_08_004
crossref_primary_10_1109_LRA_2025_3562371
crossref_primary_10_3390_drones8080368
crossref_primary_10_3390_math10152728
crossref_primary_10_1109_JIOT_2023_3308260
crossref_primary_10_1109_JIOT_2020_2968951
crossref_primary_10_1109_TASE_2025_3528501
crossref_primary_10_1016_j_neucom_2024_128514
crossref_primary_10_3390_app14199048
crossref_primary_10_1016_j_ress_2021_107551
crossref_primary_10_1109_ACCESS_2024_3383442
crossref_primary_10_1007_s43684_022_00045_z
crossref_primary_10_1007_s10994_022_06286_6
crossref_primary_10_1109_TNNLS_2022_3146976
crossref_primary_10_1016_j_inffus_2025_103629
crossref_primary_10_1016_j_adhoc_2025_103838
crossref_primary_10_1109_TNNLS_2024_3385097
crossref_primary_10_1109_TII_2020_3032165
crossref_primary_10_1109_TITS_2024_3407760
crossref_primary_10_1007_s10458_025_09691_y
crossref_primary_10_1109_ACCESS_2020_3011670
crossref_primary_10_1109_TNNLS_2022_3165114
crossref_primary_10_1007_s10846_023_01917_z
crossref_primary_10_1109_ACCESS_2021_3110255
crossref_primary_10_1109_TASE_2025_3563489
crossref_primary_10_3389_frobt_2024_1229026
crossref_primary_10_1007_s10458_019_09433_x
crossref_primary_10_12677_sea_2024_135072
crossref_primary_10_3390_s23031509
crossref_primary_10_1016_j_phycom_2022_101766
crossref_primary_10_1016_j_engappai_2024_108012
crossref_primary_10_1007_s10489_022_04105_y
crossref_primary_10_1109_JAS_2021_1003814
crossref_primary_10_1109_ACCESS_2025_3609457
crossref_primary_10_1109_TGCN_2024_3495236
crossref_primary_10_1109_ACCESS_2021_3087410
crossref_primary_10_1007_s10994_023_06365_2
crossref_primary_10_1111_mice_12702
crossref_primary_10_1016_j_ins_2021_11_054
crossref_primary_10_1109_COMST_2022_3200740
crossref_primary_10_1111_mice_13234
crossref_primary_10_1007_s10462_021_09996_w
crossref_primary_10_1016_j_future_2022_06_015
crossref_primary_10_1016_j_neucom_2023_126974
crossref_primary_10_1093_comjnl_bxaf076
crossref_primary_10_1080_08839514_2022_2033473
crossref_primary_10_3390_s21237829
crossref_primary_10_1016_j_engappai_2025_110978
crossref_primary_10_1145_3643862
crossref_primary_10_1007_s11768_020_00007_x
crossref_primary_10_1109_COMST_2021_3063822
crossref_primary_10_3390_aerospace9100563
crossref_primary_10_1016_j_advengsoft_2023_103487
crossref_primary_10_1016_j_physrep_2021_10_005
crossref_primary_10_3390_app15052580
crossref_primary_10_1016_j_apenergy_2022_120113
crossref_primary_10_1007_s13177_025_00521_9
crossref_primary_10_1109_ACCESS_2020_3005734
crossref_primary_10_1109_TNSM_2020_3047765
crossref_primary_10_1016_j_arcontrol_2022_03_003
crossref_primary_10_1007_s10489_022_04225_5
crossref_primary_10_1145_3699431
crossref_primary_10_1109_TETCI_2024_3360282
crossref_primary_10_1109_ACCESS_2022_3227450
crossref_primary_10_3390_en18071724
crossref_primary_10_1109_TR_2022_3158279
crossref_primary_10_1016_j_ijepes_2023_109641
crossref_primary_10_1051_bioconf_202411604005
crossref_primary_10_1109_TNNLS_2023_3264540
crossref_primary_10_1109_TCCN_2021_3063170
crossref_primary_10_1177_14727978251348624
crossref_primary_10_1109_TNNLS_2021_3071959
crossref_primary_10_1109_ACCESS_2020_2964042
crossref_primary_10_1007_s13042_023_01976_6
crossref_primary_10_1016_j_ijepes_2022_108848
crossref_primary_10_1073_pnas_2319925121
crossref_primary_10_1109_TC_2025_3587976
crossref_primary_10_3390_e23040461
crossref_primary_10_1016_j_apenergy_2024_123625
crossref_primary_10_1016_j_cogsys_2025_101338
crossref_primary_10_1016_j_jksuci_2023_101836
crossref_primary_10_1016_j_engappai_2025_110352
crossref_primary_10_1109_TCSS_2024_3428334
crossref_primary_10_1016_j_rser_2024_114282
crossref_primary_10_1109_LCOMM_2024_3369761
crossref_primary_10_1016_j_simpat_2025_103100
crossref_primary_10_3389_fpsyt_2025_1562061
crossref_primary_10_1007_s40747_023_01145_w
crossref_primary_10_1109_LRA_2022_3224667
crossref_primary_10_3390_act10100268
crossref_primary_10_1088_1742_6596_2449_1_012031
crossref_primary_10_1017_dap_2024_86
crossref_primary_10_1109_ACCESS_2024_3486346
crossref_primary_10_1007_s10489_022_03605_1
crossref_primary_10_1109_LRA_2020_3010203
crossref_primary_10_3390_app12178641
crossref_primary_10_3390_math11102234
crossref_primary_10_1109_ACCESS_2025_3580279
crossref_primary_10_3389_frobt_2024_1394209
crossref_primary_10_1016_j_knosys_2022_109916
crossref_primary_10_1016_j_comcom_2023_01_009
crossref_primary_10_1088_1757_899X_1292_1_012019
crossref_primary_10_1007_s11424_025_4426_7
crossref_primary_10_1016_j_neunet_2025_107192
crossref_primary_10_3390_smartcities5010019
crossref_primary_10_1109_TMI_2020_3048477
crossref_primary_10_1109_LRA_2025_3555940
crossref_primary_10_1073_pnas_2319948121
crossref_primary_10_1016_j_chaos_2025_116550
crossref_primary_10_1007_s13042_023_02063_6
crossref_primary_10_1109_TCYB_2025_3575419
crossref_primary_10_3390_app15126939
crossref_primary_10_1155_2022_4830491
crossref_primary_10_1109_ACCESS_2021_3053348
crossref_primary_10_1162_artl_a_00408
crossref_primary_10_3390_drones7090589
crossref_primary_10_1109_TG_2022_3232390
crossref_primary_10_1007_s12530_024_09587_4
crossref_primary_10_1016_j_swevo_2025_102132
crossref_primary_10_1109_TVT_2023_3344934
crossref_primary_10_1109_TPDS_2020_3046737
crossref_primary_10_1145_3431843_3431847
crossref_primary_10_1007_s40747_024_01415_1
crossref_primary_10_1080_0952813X_2024_2361408
crossref_primary_10_1109_TCCN_2021_3080677
crossref_primary_10_1088_1742_6596_2646_1_012021
crossref_primary_10_1038_s41598_023_28627_8
crossref_primary_10_1007_s00521_024_10524_4
crossref_primary_10_1007_s13202_025_02014_7
crossref_primary_10_1016_j_sysarc_2022_102551
crossref_primary_10_1051_e3sconf_202127001036
crossref_primary_10_1007_s10462_022_10224_2
crossref_primary_10_1016_j_neucom_2021_10_093
crossref_primary_10_1177_15741702251370050
crossref_primary_10_1016_j_eswa_2023_121111
crossref_primary_10_3390_drones9070484
crossref_primary_10_1109_TITS_2023_3276416
crossref_primary_10_1007_s12204_025_2814_8
crossref_primary_10_1007_s12652_020_02198_2
crossref_primary_10_1016_j_epsr_2025_111720
crossref_primary_10_1145_3759919
crossref_primary_10_1162_artl_a_00384
crossref_primary_10_1109_JIOT_2023_3288050
crossref_primary_10_3390_app15020836
crossref_primary_10_1109_JAS_2023_123705
crossref_primary_10_1145_3603703
crossref_primary_10_1007_s11704_024_3797_6
crossref_primary_10_1016_j_scico_2024_103176
crossref_primary_10_1109_TII_2022_3169457
crossref_primary_10_1016_j_inffus_2024_102318
crossref_primary_10_1177_01423312221077755
crossref_primary_10_1109_TG_2024_3485726
crossref_primary_10_1109_JIOT_2025_3585025
crossref_primary_10_1109_TKDE_2020_3014246
crossref_primary_10_1109_TVT_2022_3219428
crossref_primary_10_1109_JIOT_2022_3226953
crossref_primary_10_1109_ACCESS_2021_3091605
crossref_primary_10_1007_s00521_022_07960_5
crossref_primary_10_3390_app12020610
crossref_primary_10_1109_TPEL_2020_2971637
crossref_primary_10_3390_s22134732
crossref_primary_10_1109_ACCESS_2021_3082259
crossref_primary_10_1016_j_aei_2025_103878
crossref_primary_10_14201_ADCAIJ2019843340
crossref_primary_10_3390_fi14010017
crossref_primary_10_1109_TIV_2024_3408257
crossref_primary_10_1007_s43684_025_00090_4
crossref_primary_10_1287_moor_2022_1274
crossref_primary_10_1109_LWC_2023_3316794
crossref_primary_10_1016_j_engappai_2025_110446
crossref_primary_10_3390_s22051746
crossref_primary_10_1016_j_apenergy_2022_119067
crossref_primary_10_1016_j_comcom_2023_04_006
crossref_primary_10_1007_s40745_025_00641_9
crossref_primary_10_1016_j_eswa_2024_124117
crossref_primary_10_3390_app13137486
crossref_primary_10_1109_TNSE_2021_3136942
crossref_primary_10_3390_electronics11244204
crossref_primary_10_1109_JIOT_2022_3194726
crossref_primary_10_1109_TVT_2023_3326877
crossref_primary_10_1109_TNSM_2022_3205900
crossref_primary_10_23919_JSEE_2022_000119
crossref_primary_10_1016_j_engappai_2022_105329
crossref_primary_10_1016_j_robot_2022_104307
crossref_primary_10_1016_j_rser_2020_110618
crossref_primary_10_3390_fi17090404
crossref_primary_10_1016_j_eswa_2025_129421
Cites_doi 10.1109/TNNLS.2016.2582924
10.1016/0004-3702(92)90058-6
10.1609/aaai.v32i1.11606
10.1145/1390156.1390240
10.1109/ADPRL.2009.4927542
10.1109/TAC.2005.843878
10.1037/0033-2909.86.3.638
10.1613/jair.4818
10.1038/s41598-019-45619-9
10.1016/j.jcss.2007.08.009
10.1016/S1389-0417(01)00015-8
10.1002/9780470316887
10.1109/IROS.2012.6386109
10.3233/KES-2010-0206
10.1007/978-3-642-14435-6_7
10.1613/jair.1000
10.1007/s10994-006-9643-2
10.1038/nature14236
10.1007/s10994-011-5235-x
10.2200/S00268ED1V01Y201005AIM009
10.1038/nature14539
10.24963/ijcai.2019/88
10.24963/ijcai.2018/55
10.1093/oso/9780195099713.001.0001
10.1162/neco.1994.6.6.1185
10.1609/aimag.v33i3.2426
10.1162/0033553041502225
10.1145/1015330.1015410
10.1613/jair.678
10.1126/science.aao1733
10.1111/1468-0262.00239
10.1609/aaai.v33i01.33014504
10.1609/aaai.v32i1.11796
10.1609/aaai.v27i1.8659
10.1016/0304-3975(94)90181-3
10.1016/0303-2647(95)01551-5
10.1007/s10994-010-5192-9
10.1287/opre.1060.0291
10.1038/nature24270
10.1109/CIG.2018.8490422
10.1007/s10458-016-9352-6
10.1145/203330.203343
10.1613/jair.1.11396
10.1145/1160633.1160776
10.1609/aaai.v30i1.10295
10.1609/aaai.v32i1.11694
10.1162/neco.1991.3.1.79
10.1007/978-3-319-28929-8
10.1177/0278364918755924
10.1016/j.neunet.2014.09.003
10.1145/1150402.1150464
10.1145/1160633.1160770
10.1109/ISIC.1992.225046
10.1109/ICNN.1997.616132
10.1162/0899766053011528
10.24963/ijcai.2018/820
10.1023/A:1008942012299
10.1613/jair.5507
10.1126/science.aam6960
10.1609/aaai.v29i1.9439
10.1007/s10458-013-9222-4
10.1023/A:1007518724497
10.1109/ICRA.2015.7139357
10.1145/301136.301167
10.1126/science.aau6249
10.1007/978-3-319-71682-4_5
10.1287/moor.27.4.819.297
10.1613/jair.613
10.24963/ijcai.2018/79
10.2307/2951492
10.1145/3219819.3219918
10.1007/BFb0040758
10.1609/aaai.v32i1.11492
10.1073/pnas.36.1.48
10.1142/9789812777263_0020
10.1145/1329125.1329434
10.1109/TCIAIG.2012.2186810
10.1016/0097-3165(73)90005-8
10.1613/jair.1579
10.1145/860575.860686
10.1162/089976699300016070
10.1109/TEVC.2012.2208755
10.1109/78.650093
10.1038/nature16961
10.1613/jair.3912
10.1023/A:1007379606734
10.1016/0022-247X(65)90154-X
10.1561/2200000071
10.1007/3-540-52255-7_33
10.1613/jair.2447
10.1057/9780230523371_8
10.1007/3-540-61723-X_967
10.1126/science.7466396
10.1162/evco.1997.5.1.1
10.1007/978-3-642-27645-3_14
10.1145/1143997.1144059
10.1016/j.artint.2018.01.002
10.1613/jair.301
10.1007/s10458-008-9046-9
10.24963/ijcai.2018/813
10.1613/jair.1497
10.1145/3271625
10.1613/jair.433
10.1007/s10458-005-2631-2
10.1609/aaai.v24i1.7529
10.1016/j.artint.2013.05.004
10.1609/aaai.v32i1.11595
10.1287/mnsc.14.3.159
10.1007/s10994-016-5547-y
10.1609/aiide.v15i1.5220
10.1023/A:1010028119149
10.1609/aaai.v31i1.10810
10.1162/neco.1997.9.8.1735
10.1016/S0004-3702(02)00121-2
10.1006/jeth.1996.0014
10.1371/journal.pone.0172395
10.1609/aiide.v15i1.5221
10.1023/A:1007678930559
10.1609/aaai.v24i1.7639
10.1007/978-3-642-27645-3
10.1007/978-3-642-32375-1_2
10.1109/ADPRL.2011.5967363
10.1609/aaai.v33i01.33014213
10.1016/j.artint.2006.02.006
10.1145/1143844.1143906
10.1017/S0269888912000057
10.1016/B978-1-55860-335-6.50027-1
10.1613/jair.5699
10.1007/11564096_32
10.1609/aaai.v33i01.33016079
10.1126/science.1259433
10.1016/0004-3702(95)00103-4
10.1145/1390156.1390199
10.1109/TSMCC.2007.913919
ContentType Journal Article
Copyright Springer Science+Business Media, LLC, part of Springer Nature 2019
Copyright Springer Nature B.V. 2019
Copyright_xml – notice: Springer Science+Business Media, LLC, part of Springer Nature 2019
– notice: Copyright Springer Nature B.V. 2019
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s10458-019-09421-1
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-7454
EndPage 797
ExternalDocumentID 10_1007_s10458_019_09421_1
GroupedDBID -59
-5G
-BR
-EM
-Y2
-~C
.86
.DC
.VR
06D
0R~
0VY
1N0
1SB
203
23N
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5GY
5VS
67Z
6J9
6NX
6TJ
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFO
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACREN
ACSNA
ACZOJ
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADYOE
ADZKW
AEBTG
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFGCZ
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMTXH
AMXSW
AMYLF
AMYQR
AOCGG
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BA0
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IXE
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
LAK
LLZTM
M4Y
MA-
N2Q
NB0
NPVJJ
NQJWS
NU0
O9-
O93
O9J
OAM
OVD
P2P
P9O
PF0
PT4
PT5
QOS
R89
R9I
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S27
S3B
SAP
SCO
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
TEORI
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
YLTOR
Z45
Z7R
Z7X
Z81
Z83
Z88
ZMTXR
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
JQ2
ID FETCH-LOGICAL-c319t-36981aaeef729ec194cd0f1e7207d82849c6bcb57564449bb70008ccb15a1bec3
IEDL.DBID RSV
ISICitedReferencesCount 385
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000491059500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1387-2532
IngestDate Thu Sep 18 00:00:30 EDT 2025
Sat Nov 29 01:33:08 EST 2025
Tue Nov 18 21:40:21 EST 2025
Fri Feb 21 02:32:29 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 6
Keywords Multiagent learning
Deep reinforcement learning
Multiagent systems
Survey
Multiagent reinforcement learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c319t-36981aaeef729ec194cd0f1e7207d82849c6bcb57564449bb70008ccb15a1bec3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-8530-6775
PQID 2307168851
PQPubID 2043870
PageCount 48
ParticipantIDs proquest_journals_2307168851
crossref_citationtrail_10_1007_s10458_019_09421_1
crossref_primary_10_1007_s10458_019_09421_1
springer_journals_10_1007_s10458_019_09421_1
PublicationCentury 2000
PublicationDate 2019-11-01
PublicationDateYYYYMMDD 2019-11-01
PublicationDate_xml – month: 11
  year: 2019
  text: 2019-11-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Autonomous agents and multi-agent systems
PublicationTitleAbbrev Auton Agent Multi-Agent Syst
PublicationYear 2019
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References ShohamYPowersRGrenagerTIf multi-agent learning is the answer, what is the question?Artificial Intelligence2007171736537723322841168.68493
Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., & Kavukcuoglu, K. (2017). FeUdal networks for hierarchical reinforcement learning. In International conference on machine learning.
HarsanyiJCGames with incomplete information played by “Bayesian” players, I–III part I. The basic modelManagement Science19671431591822466490207.51102
Liu, S., Lever, G., Merel, J., Tunyasuvunakool, S., Heess, N., & Graepel, T. (2019). Emergent coordination through competition. In International conference on learning representations.
Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121.
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations.
Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NIPS deep learning workshop.
Neller, T. W., & Lanctot, M. (2013). An introduction to counterfactual regret minimization. In Proceedings of model AI assignments, the fourth symposium on educational advances in artificial intelligence (EAAI-2013).
SilvaFLCostaAHRA survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research201964645703393255907037594
DarwicheAHuman-level intelligence or animal-like abilities?Communications of the ACM201861105667
BowlingMConvergence and no-regret in multiagent learningAdvances in neural information processing systems2004CanadaVancouver209216
Ciosek, K. A., & Whiteson, S. (2017). Offer: Off-environment reinforcement learning. In Thirty-first AAAI conference on artificial intelligence.
Schmidhuber, J. (2015). Critique of Paper by “Deep Learning Conspiracy” (Nature 521 p 436). http://people.idsia.ch/~juergen/deep-learning-conspiracy.html.
Omidshafiei, S., Hennes, D., Morrill, D., Munos, R., Perolat, J., Lanctot, M., Gruslys, A., Lespiau, J. B., & Tuyls, K. (2019). Neural replicator dynamics. arXiv e-prints arXiv:1906.00190.
Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the 19th international joint conference on artificial intelligence (pp. 817–822). Edinburg, Scotland, UK.
Stooke, A., & Abbeel, P. (2018). Accelerated methods for deep reinforcement learning. CoRR arXiv:1803.02811.
Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. In International conference on machine learning.
Lipton, Z. C., Azizzadenesheli, K., Kumar, A., Li, L., Gao, J., & Deng, L. (2018). Combating reinforcement learning’s Sisyphean curse with intrinsic fear. arXiv:1611.01211v8.
Lu, T., Schuurmans, D., & Boutilier, C. (2018). Non-delusional Q-learning and value-iteration. In Advances in neural information processing systems (pp. 9949–9959).
Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat games. arXiv:1703.10069
Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 2817–2826). JMLR. org
Kaisers, M., & Tuyls, K. (2011). FAQ-learning in matrix games: demonstrating convergence near Nash equilibria, and bifurcation of attractors in the battle of sexes. In AAAI Workshop on Interactive Decision Theory and Game Theory (pp. 309–316). San Francisco, CA, USA.
Pesce, E., & Montana, G. (2019). Improving coordination in multi-agent deep reinforcement learning through memory-driven communication. CoRR arXiv:1901.03887.
JacobsRAJordanMINowlanSJHintonGEAdaptive mixtures of local expertsNeural Computation1991317987
Srinivasan, S., Lanctot, M., Zambaldi, V., Pérolat, J., Tuyls, K., Munos, R., & Bowling, M. (2018). Actor-critic policy optimization in partially observable multiagent environments. In Advances in neural information processing systems (pp. 3422–3435).
Lehman, J., & Stanley, K. O. (2008). Exploiting open-endedness to solve problems through the search for novelty. In ALIFE (pp. 329–336).
Melis, G., Dyer, C., & Blunsom, P. (2018). On the state of the art of evaluation in neural language models. In International conference on learning representations.
BestGCliffOMPattenTMettuRRFitchRDec-MCTS: Decentralized planning for multi-robot active perceptionThe International Journal of Robotics Research2019382–3316337
Foerster, J. N., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning.
KaelblingLPLittmanMLMooreAWReinforcement learning: A surveyJournal of Artificial Intelligence Research19964237285
Song, X., Wang, T., & Zhang, C. (2019). Convergence of multi-agent learning with a finite step size in general-sum games. In 18th International conference on autonomous agents and multiagent systems.
McCracken, P., & Bowling, M. (2004) Safe strategies for agent modelling in games. In AAAI fall symposium (pp. 103–110).
CrandallJWGoodrichMALearning to compete, coordinate, and cooperate in repeated games using reinforcement learningMachine Learning201182328131431081951237.68142
Bloembergen, D., Kaisers, M., & Tuyls, K. (2010). Lenient frequency adjusted Q-learning. In Proceedings of the 22nd Belgian/Netherlands artificial intelligence conference.
MatignonLLaurentGJLe Fort-PiatNIndependent reinforcement learners in cooperative Markov games: A survey regarding coordination problemsKnowledge Engineering Review2012271131
Oliehoek, F. A. (2018). Interactive learning and decision making - foundations, insights & challenges. In International joint conference on artificial intelligence.
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems (pp. 1471–1479).
GmytrasiewiczPJDurfeeEHRational coordination in multi-agent environmentsAutonomous Agents and Multi-Agent Systems200034319350
Johnson, M., Hofmann, K., Hutton, T., & Bignell, D. (2016). The Malmo platform for artificial intelligence experimentation. In IJCAI (pp. 4246–4247).
Kartal, B., Godoy, J., Karamouzas, I., & Guy, S. J. (2015). Stochastic tree search with useful cycles for patrolling problems. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 1289–1294). IEEE.
Papoudakis, G., Christianos, F., Rahman, A., & Albrecht, S. V. (2019). Dealing with non-stationarity in multi-agent deep reinforcement learning. arXiv preprint arXiv:1906.04737.
TesauroGTemporal difference learning and TD-GammonCommunications of the ACM19953835868
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Deep learning and representation learning workshop.
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Advances in neural information processing systems.
MaatenLvdHintonGVisualizing data using t-SNEJournal of Machine Learning Research20089Nov257926051225.68219
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in neural information processing systems (pp. 1038–1044).
StonePVelosoMMMultiagent systems - a survey from a machine learning perspectiveAutonomous Robots200083345383
Hernandez-Leal, P., & Kaisers, M. (2017). Learning against sequential opponents in repeated stochastic games. In The 3rd multi-disciplinary conference on reinforcement learning and decision making. Ann Arbor.
SamothrakisSLucasSRunarssonTRoblesDCoevolving game-playing agents: Measuring performance and intransitivitiesIEEE Transactions on Evolutionary Computation2013172213226
BairdLResidual algorithms: Reinforcement learning with function approximationMachine Learning Proceedings199519953037
François-LavetVHendersonPIslamRBellemareMGPineauJAn introduction to deep reinforcement learningFoundations and Trends® in Machine Learning2018113–421935407039358
LaurentGJMatignonLFort-PiatLThe world of independent learners is not MarkovianInternational Journal of Knowledge-based and Intelligent Engineering Systems20111515564
Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In Advances in neural information processing systems (pp. 1523–1530).
Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2018). Unity: A general platform for intelligent agents. CoRR arXiv:1809.02627.
Hernandez-LealPZhanYTaylorMESucarLEMunoz de CoteEEfficiently detecting switches against non-stationary opponentsAutonomous Agents and Multi-Agent Systems2017314767789
Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the sixteenth international conference on machine learning (pp. 278–287).
CarmelDMarkovitchSIncorporating opponent models into adversary searchAAAI/IAAI19961120125
BartoAGMirolliMBaldassarreGIntrinsic motivation and reinforcement learningIntrinsically motivated learning in natural and artificial systems2013BerlinSpringer1747
Resnick, C., Eldridge, W., Ha, D., Britz, D., Foerster, J., Togelius, J., Cho, K., & Bruna, J. (2018). Pommerman: A multi-agent playground. arXiv:1809.07124.
Song, Y., Wang, J., Lukasiewicz, T., Xu, Z., Xu, M., Ding, Z., & Wu, L. (2019). Arena: A general evaluation platform and building toolkit for multi-agent intelligence. CoRR arXiv:1905.08085.
Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv:1708.05866v2.
Camerer, C. F., Ho, T. H., &
9421_CR189
T Vodopivec (9421_CR340) 2017; 60
9421_CR188
9421_CR187
9421_CR186
9421_CR185
9421_CR183
P Erdös (9421_CR91) 1973; 14
9421_CR181
9421_CR180
PJ Gmytrasiewicz (9421_CR110) 2000; 3
V Mnih (9421_CR221) 2015; 518
DS Bernstein (9421_CR34) 2002; 27
A Darwiche (9421_CR81) 2018; 61
Y Shoham (9421_CR289) 2007; 171
DE Moriarty (9421_CR226) 1999; 11
CF Camerer (9421_CR59) 2004; 119
9421_CR178
9421_CR177
9421_CR298
9421_CR176
9421_CR297
9421_CR175
9421_CR174
9421_CR295
9421_CR173
9421_CR172
9421_CR171
9421_CR292
9421_CR170
P Hernandez-Leal (9421_CR144) 2017; 31
RA Jacobs (9421_CR155) 1991; 3
JF Nash (9421_CR229) 1950; 36
A Nowé (9421_CR233) 2012
L Busoniu (9421_CR56) 2010
D Ernst (9421_CR92) 2005; 6
H de Weerd (9421_CR76) 2013; 199–200
R Hafner (9421_CR128) 2011; 84
9421_CR169
MC Machado (9421_CR211) 2018; 61
M Costa Gomes (9421_CR72) 2001; 69
R Caruana (9421_CR62) 1997; 28
R Rosenthal (9421_CR270) 1979; 86
E Alonso (9421_CR7) 2002; 16
S Mahadevan (9421_CR212) 1992; 55
D Silver (9421_CR291) 2016; 529
9421_CR199
9421_CR198
9421_CR197
S Whiteson (9421_CR351) 2006; 7
9421_CR1
9421_CR196
9421_CR2
9421_CR195
C Szepesvári (9421_CR318) 2010; 4
D Fudenberg (9421_CR103) 1991
9421_CR193
9421_CR192
9421_CR191
9421_CR190
RS Sutton (9421_CR315) 2018
G Best (9421_CR35) 2019; 38
T De Bruin (9421_CR83) 2018; 19
R Powers (9421_CR259) 2007; 67
E Even-Dar (9421_CR94) 2003; 5
V François-Lavet (9421_CR101) 2018; 11
AW Moore (9421_CR223) 1993; 13
9421_CR300
BJ Grosz (9421_CR115) 1996; 86
J Hu (9421_CR149) 2003; 4
FA Oliehoek (9421_CR239) 2008; 32
ML Puterman (9421_CR261) 1994
TI Ahamed (9421_CR4) 2006; 54
D Monderer (9421_CR222) 1996; 68
B Rosman (9421_CR272) 2016; 104
9421_CR5
TW Sandholm (9421_CR277) 1996; 37
9421_CR8
CM Bishop (9421_CR36) 2006
9421_CR9
D Silver (9421_CR293) 2017; 550
9421_CR93
RH Crites (9421_CR74) 1998; 33
S Samothrakis (9421_CR275) 2013; 17
L Panait (9421_CR250) 2008; 9
9421_CR96
9421_CR95
9421_CR98
9421_CR97
9421_CR99
9421_CR201
9421_CR320
S Hochreiter (9421_CR147) 1997; 9
9421_CR80
9421_CR82
J Tsitsiklis (9421_CR329) 1994; 16
KA De Jong (9421_CR85) 2006
JC Harsanyi (9421_CR129) 1967; 14
JK Gupta (9421_CR123) 2017
9421_CR84
9421_CR317
9421_CR86
9421_CR316
9421_CR89
LP Kaelbling (9421_CR164) 1996; 4
9421_CR88
9421_CR314
9421_CR313
9421_CR312
9421_CR311
9421_CR310
CD Rosin (9421_CR271) 1997; 5
9421_CR70
R Bellman (9421_CR33) 1957; 6
D Chakraborty (9421_CR66) 2013; 28
FL Silva (9421_CR290) 2019; 64
A Blum (9421_CR39) 2007
JW Crandall (9421_CR73) 2011; 82
9421_CR309
9421_CR308
9421_CR306
9421_CR75
9421_CR78
9421_CR304
9421_CR77
9421_CR303
9421_CR302
9421_CR79
9421_CR102
9421_CR344
9421_CR343
9421_CR100
9421_CR342
9421_CR220
9421_CR60
JS Shamma (9421_CR287) 2005; 50
K Greff (9421_CR114) 2017; 28
M Bowling (9421_CR42) 2004
L Matignon (9421_CR213) 2012; 27
J Von Neumann (9421_CR341) 1945
BM Lake (9421_CR179) 2016; 40
9421_CR63
9421_CR219
9421_CR65
9421_CR218
9421_CR339
9421_CR64
9421_CR217
9421_CR338
9421_CR67
9421_CR216
9421_CR337
9421_CR215
9421_CR336
9421_CR69
9421_CR335
9421_CR68
9421_CR334
9421_CR332
9421_CR331
L Baird (9421_CR21) 1995; 1995
9421_CR330
C Guestrin (9421_CR120) 2003; 19
J Spencer (9421_CR299) 1994; 131
R Becker (9421_CR28) 2004; 22
M McCloskey (9421_CR214) 1989
M Bowling (9421_CR45) 2002; 136
GW Brown (9421_CR49) 1951; 13
9421_CR52
9421_CR209
K Tuyls (9421_CR333) 2012; 33
9421_CR208
9421_CR54
9421_CR207
9421_CR328
9421_CR53
9421_CR206
9421_CR327
9421_CR205
S Singh (9421_CR294) 2000; 38
9421_CR326
9421_CR204
P Stone (9421_CR305) 2000; 8
C Szepesvári (9421_CR319) 1999; 11
9421_CR58
9421_CR203
S Omidshafiei (9421_CR243) 2019; 9
9421_CR57
9421_CR202
9421_CR124
9421_CR245
9421_CR366
9421_CR244
9421_CR365
9421_CR122
9421_CR364
9421_CR121
9421_CR242
9421_CR363
9421_CR241
9421_CR362
9421_CR240
9421_CR361
9421_CR360
L Panait (9421_CR248) 2005; 11
M Tambe (9421_CR321) 1997; 7
V Conitzer (9421_CR71) 2006; 67
9421_CR48
J Schmidhuber (9421_CR281) 2015; 61
9421_CR41
9421_CR40
9421_CR119
9421_CR118
E Kalai (9421_CR167) 1993; 61
9421_CR117
9421_CR238
9421_CR359
9421_CR44
9421_CR116
9421_CR358
KJ Astrom (9421_CR14) 1965; 10
9421_CR236
9421_CR357
9421_CR46
9421_CR235
9421_CR356
9421_CR113
9421_CR234
9421_CR355
9421_CR112
9421_CR111
9421_CR232
9421_CR353
9421_CR231
9421_CR352
9421_CR230
M Hauskrecht (9421_CR132) 2000; 13
AL Strehl (9421_CR307) 2008; 74
AK Agogino (9421_CR3) 2008; 17
9421_CR37
9421_CR30
9421_CR108
9421_CR31
9421_CR107
9421_CR228
9421_CR349
9421_CR106
9421_CR348
9421_CR105
SP Singh (9421_CR296) 1992; 8
9421_CR104
9421_CR225
9421_CR346
9421_CR345
9421_CR146
9421_CR267
9421_CR145
9421_CR266
9421_CR265
CB Browne (9421_CR51) 2012; 4
9421_CR143
9421_CR264
9421_CR142
9421_CR263
9421_CR141
9421_CR262
9421_CR140
9421_CR260
L Busoniu (9421_CR55) 2008; 38
M Schuster (9421_CR285) 1997; 45
M Bowling (9421_CR43) 2015; 347
R Axelrod (9421_CR15) 1981; 211
M Moravčík (9421_CR224) 2017; 356
9421_CR26
9421_CR29
N Brown (9421_CR50) 2018; 359
GJ Laurent (9421_CR182) 2011; 15
D Carmel (9421_CR61) 1996; 1
M Jaderberg (9421_CR156) 2019; 364
9421_CR23
9421_CR139
9421_CR22
9421_CR138
9421_CR25
9421_CR137
9421_CR258
9421_CR24
9421_CR136
9421_CR257
9421_CR135
9421_CR256
9421_CR134
9421_CR255
9421_CR133
9421_CR254
9421_CR253
SV Albrecht (9421_CR6) 2018; 258
9421_CR131
9421_CR252
9421_CR130
9421_CR251
ME Taylor (9421_CR324) 2009; 10
Thomas G. Dietterich (9421_CR87) 2000
9421_CR16
9421_CR18
9421_CR17
9421_CR19
Ming Tan (9421_CR323) 1993
ML Littman (9421_CR200) 2001; 2
9421_CR10
9421_CR12
9421_CR249
9421_CR11
AE Elo (9421_CR90) 1978
9421_CR127
9421_CR126
9421_CR247
G Tesauro (9421_CR325) 1995; 38
9421_CR368
9421_CR13
9421_CR125
9421_CR246
9421_CR367
9421_CR168
9421_CR288
9421_CR166
9421_CR165
9421_CR286
9421_CR163
9421_CR284
9421_CR162
9421_CR283
9421_CR161
9421_CR282
T Back (9421_CR20) 1996
9421_CR160
9421_CR280
A Tampuu (9421_CR322) 2017; 12
AG Barto (9421_CR27) 2013
D Bloembergen (9421_CR38) 2015; 53
PJ Gmytrasiewicz (9421_CR109) 2005; 24
LJ Lin (9421_CR194) 1992; 8
Lvd Maaten (9421_CR210) 2008; 9
RI Brafman (9421_CR47) 2002; 3
9421_CR159
FA Oliehoek (9421_CR237) 2016
MG Bellemare (9421_CR32) 2013; 47
9421_CR158
9421_CR279
9421_CR157
9421_CR278
9421_CR276
9421_CR154
9421_CR153
9421_CR274
9421_CR152
9421_CR273
9421_CR151
9421_CR150
N Srivastava (9421_CR301) 2014; 15
J Morimoto (9421_CR227) 2005; 17
E Wei (9421_CR347) 2016; 17
Y LeCun (9421_CR184) 2015; 521
RJ Williams (9421_CR354) 1992; 8
9421_CR148
9421_CR269
9421_CR268
(9421_CR350) 2013
References_xml – reference: PanaitLLukeSCooperative multi-agent learning: The state of the artAutonomous Agents and Multi-Agent Systems2005113387434
– reference: Ciosek, K. A., & Whiteson, S. (2017). Offer: Off-environment reinforcement learning. In Thirty-first AAAI conference on artificial intelligence.
– reference: Gencoglu, O., van Gils, M., Guldogan, E., Morikawa, C., Süzen, M., Gruber, M., Leinonen, J., & Huttunen, H. (2019). Hark side of deep learning–from grad student descent to automated machine learning. arXiv preprint arXiv:1904.07633.
– reference: Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (pp. 157–163). New Brunswick, NJ, USA.
– reference: Oliehoek, F. A., De Jong, E. D., & Vlassis, N. (2006). The parallel Nash memory for asymmetric games. In Proceedings of the 8th annual conference on genetic and evolutionary computation (pp. 337–344). ACM.
– reference: Von NeumannJMorgensternOTheory of games and economic behavior1945New YorkBulletin of the American Mathematical Society0063.05930
– reference: RosinCDBelewRKNew methods for competitive coevolutionEvolutionary Computation199751129
– reference: Bull, L., Fogarty, T. C., & Snaith, M. (1995). Evolution in multi-agent systems: Evolving communicating classifier systems for gait in a quadrupedal robot. In Proceedings of the 6th international conference on genetic algorithms (pp. 382–388). Morgan Kaufmann Publishers Inc.
– reference: Kretchmar, R. M., & Anderson, C. W. (1997). Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning. In Proceedings of international conference on neural networks (ICNN’97) (Vol. 2, pp. 834–837). IEEE.
– reference: Lehman, J., & Stanley, K. O. (2008). Exploiting open-endedness to solve problems through the search for novelty. In ALIFE (pp. 329–336).
– reference: MaatenLvdHintonGVisualizing data using t-SNEJournal of Machine Learning Research20089Nov257926051225.68219
– reference: MoravčíkMSchmidMBurchNLisýVMorrillDBardNDavisTWaughKJohansonMBowlingMDeepStack: Expert-level artificial intelligence in heads-up no-limit pokerScience2017356633750851336769531403.68202
– reference: Riemer, M., Cases, I., Ajemian, R., Liu, M., Rish, I., Tu, Y., & Tesauro, G. (2018). Learning to learn without forgetting by maximizing transfer and minimizing interference. CoRR arXiv:1810.11910.
– reference: Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of 17th international conference on autonomous agents and multiagent systems. Stockholm, Sweden.
– reference: Yang, Y., Hao, J., Sun, M., Wang, Z., Fan, C., & Strbac, G. (2018). Recurrent deep multiagent Q-learning for autonomous brokers in smart grid. In Proceedings of the twenty-seventh international joint conference on artificial intelligence. Stockholm, Sweden.
– reference: GuestrinCKollerDParrRVenkataramanSEfficient solution algorithms for factored MDPsJournal of Artificial Intelligence Research20031939946821156281026.68125
– reference: Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. In Proceedings of 17th international conference on autonomous agents and multiagent systems (pp. 322–328). Williamstown, MA, USA.
– reference: SuttonRSBartoAGReinforcement learning: An introduction20182CambridgeMIT Press1407.68009
– reference: Van Seijen, H., Van Hasselt, H., Whiteson, S., & Wiering, M. (2009). A theoretical and empirical analysis of Expected Sarsa. In IEEE symposium on adaptive dynamic programming and reinforcement learning (pp. 177–184). Nashville, TN, USA.
– reference: Zinkevich, M., Greenwald, A., & Littman, M. L. (2006). Cyclic equilibria in Markov games. In Advances in neural information processing systems (pp. 1641–1648).
– reference: Tsitsiklis, J. N., & Van Roy, B. (1997). Analysis of temporal-diffference learning with function approximation. In Advances in neural information processing systems (pp. 1075–1081).
– reference: McCracken, P., & Bowling, M. (2004) Safe strategies for agent modelling in games. In AAAI fall symposium (pp. 103–110).
– reference: Tucker, G., Bhupatiraju, S., Gu, S., Turner, R. E., Ghahramani, Z., & Levine, S. (2018). The mirage of action-dependent baselines in reinforcement learning. In International conference on machine learning.
– reference: DietterichThomas G.Ensemble Methods in Machine LearningMultiple Classifier Systems2000Berlin, HeidelbergSpringer Berlin Heidelberg115
– reference: SinghSPTransfer of learning by composing solutions of elemental sequential tasksMachine Learning199283–43233390772.68073
– reference: Guss, W. H., Codel, C., Hofmann, K., Houghton, B., Kuno, N., Milani, S., Mohanty, S. P., Liebana, D. P., Salakhutdinov, R., Topin, N., Veloso, M., & Wang, P. (2019). The MineRL competition on sample efficient reinforcement learning using human priors. CoRR arXiv:1904.10079.
– reference: MorimotoJDoyaKRobust reinforcement learningNeural Computation20051723353592176304
– reference: Frank, J., Mannor, S., & Precup, D. (2008). Reinforcement learning in the presence of rare events. In Proceedings of the 25th international conference on machine learning (pp. 336–343). ACM.
– reference: Greenwald, A., & Hall, K. (2003). Correlated Q-learning. In Proceedings of 17th international conference on autonomous agents and multiagent systems (pp. 242–249). Washington, DC, USA.
– reference: Khadka, S., Majumdar, S., & Tumer, K. (2019). Evolutionary reinforcement learning for sample-efficient multiagent coordination. arXiv e-prints arXiv:1906.07315.
– reference: AlbrechtSVStonePAutonomous agents modelling other agents: A comprehensive survey and open problemsArtificial Intelligence20182586695377115806887463
– reference: HafnerRRiedmillerMReinforcement learning in feedback controlMachine Learning2011841–21371693108221
– reference: Kartal, B., Godoy, J., Karamouzas, I., & Guy, S. J. (2015). Stochastic tree search with useful cycles for patrolling problems. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 1289–1294). IEEE.
– reference: BusoniuLBabuskaRDe SchutterBSrinivasanDJainLCMulti-agent reinforcement learning: An overviewInnovations in multi-agent systems and applications - 12010BerlinSpringer183221
– reference: Melis, G., Dyer, C., & Blunsom, P. (2018). On the state of the art of evaluation in neural language models. In International conference on learning representations.
– reference: Stooke, A., & Abbeel, P. (2018). Accelerated methods for deep reinforcement learning. CoRR arXiv:1803.02811.
– reference: BusoniuLBabuskaRDe SchutterBA comprehensive survey of multiagent reinforcement learningIEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews)2008382156172
– reference: Zhao, J., Qiu, G., Guan, Z., Zhao, W., & He, X. (2018). Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1021–1030). ACM.
– reference: Beeching, E., Wolf, C., Dibangoye, J., & Simonin, O. (2019). Deep reinforcement learning on a budget: 3D Control and reasoning without a supercomputer. CoRR arXiv:1904.01806.
– reference: Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., & Simonyan, K., et al. (2017). Population based training of neural networks. arXiv:1711.09846.
– reference: BernsteinDSGivanRImmermanNZilbersteinSThe complexity of decentralized control of Markov decision processesMathematics of Operations Research200227481984019391791082.90593
– reference: BlumAMonsourYNisanNLearning, regret minimization, and equilibria. Chap. 4Algorithmic game theory2007CambridgeCambridge University Press
– reference: BrafmanRITennenholtzMR-max-a general polynomial time algorithm for near-optimal reinforcement learningJournal of Machine Learning Research20023Oct21323119713371088.68694
– reference: Sculley, D., Snoek, J., Wiltschko, A., & Rahimi, A. (2018). Winner’s curse? On pace, progress, and empirical rigor. In ICLR workshop.
– reference: Oliehoek, F. A., Whiteson, S., & Spaan, M. T. (2013). Approximate solutions for factored Dec-POMDPs with many agents. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems (pp. 563–570). International Foundation for Autonomous Agents and Multiagent Systems.
– reference: Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., & Mordatch, I. (2018). Emergent complexity via multi-agent competition. In International conference on machine learning.
– reference: PanaitLTuylsKLukeSTheoretical advantages of lenient learners: An evolutionary game theoretic perspectiveJMLR20089Mar42345724172411225.68204
– reference: TaylorMEStonePTransfer learning for reinforcement learning domains: A surveyThe Journal of Machine Learning Research2009101633168525348741235.68196
– reference: Tumer, K., & Agogino, A. (2007). Distributed agent-based air traffic flow management. In Proceedings of the 6th international conference on autonomous agents and multiagent systems. Honolulu, Hawaii.
– reference: BairdLResidual algorithms: Reinforcement learning with function approximationMachine Learning Proceedings199519953037
– reference: Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., & Bengio, Y. (2013). An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv:1312.6211
– reference: Lu, T., Schuurmans, D., & Boutilier, C. (2018). Non-delusional Q-learning and value-iteration. In Advances in neural information processing systems (pp. 9949–9959).
– reference: Bard, N., Foerster, J. N., Chandar, S., Burch, N., Lanctot, M., & Song, H. F., et al. (2019). The Hanabi challenge: A new frontier for AI research. arXiv:1902.00506.
– reference: KaelblingLPLittmanMLMooreAWReinforcement learning: A surveyJournal of Artificial Intelligence Research19964237285
– reference: François-LavetVHendersonPIslamRBellemareMGPineauJAn introduction to deep reinforcement learningFoundations and Trends® in Machine Learning2018113–421935407039358
– reference: Arjona-Medina, J. A., Gillhofer, M., Widrich, M., Unterthiner, T., & Hochreiter, S. (2018). RUDDER: Return decomposition for delayed rewards. arXiv:1806.07857.
– reference: Nguyen, T. T., Nguyen, N. D., & Nahavandi, S. (2018). Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications. arXiv preprint arXiv:1812.11794.
– reference: Hernandez-Leal, P., & Kaisers, M. (2017). Towards a fast detection of opponents in repeated stochastic games. In G. Sukthankar, & J. A. Rodriguez-Aguilar (Eds.) Autonomous agents and multiagent systems: AAMAS 2017 Workshops, Best Papers, Sao Paulo, Brazil, 8–12 May, 2017, Revised selected papers (pp. 239–257).
– reference: AlonsoED’invernoMKudenkoDLuckMNobleJLearning in multi-agent systemsKnowledge Engineering Review2002160318
– reference: De Hauwere, Y. M., Vrancx, P., & Nowe, A. (2010). Learning multi-agent state space representations. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (pp. 715–722). Toronto, Canada.
– reference: Lockhart, E., Lanctot, M., Pérolat, J., Lespiau, J., Morrill, D., Timbers, F., & Tuyls, K. (2019). Computing approximate equilibria in sequential adversarial games by exploitability descent. CoRR arXiv:1903.05614.
– reference: De BruinTKoberJTuylsKBabuškaRExperience selection in deep reinforcement learning for controlThe Journal of Machine Learning Research2018191347402386241606982300
– reference: LakeBMUllmanTDTenenbaumJGershmanSBuilding machines that learn and think like peopleBehavioral and Brain Sciences201640172
– reference: Wolpert, D. H., Wheeler, K. R., & Tumer, K. (1999). General principles of learning-based multi-agent systems. In Proceedings of the third international conference on autonomous agents.
– reference: Heinrich, J., Lanctot, M., & Silver, D. (2015). Fictitious self-play in extensive-form games. In International conference on machine learning (pp. 805–813).
– reference: Lowe, R., Foerster, J., Boureau, Y. L., Pineau, J., & Dauphin, Y. (2019). On the pitfalls of measuring emergent communication. In 18th international conference on autonomous agents and multiagent systems.
– reference: Achiam, J., Knight, E., & Abbeel, P. (2019). Towards characterizing divergence in deep Q-learning. CoRR arXiv:1903.08894.
– reference: Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in neural information processing systems (pp. 1038–1044).
– reference: Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., & Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th international conference on autonomous agents and multiagent systems (Vol. 2, pp. 761–768). International Foundation for Autonomous Agents and Multiagent Systems.
– reference: Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Advances in neural information processing systems.
– reference: WeissGMultiagent systems. Intelligent robotics and autonomous agents series20132Cambridge, MAMIT Press
– reference: Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean field multi-agent reinforcement learning. In Proceedings of the 35th international conference on machine learning. Stockholm Sweden.
– reference: Hernandez-LealPZhanYTaylorMESucarLEMunoz de CoteEEfficiently detecting switches against non-stationary opponentsAutonomous Agents and Multi-Agent Systems2017314767789
– reference: Amato, C., & Oliehoek, F. A. (2015). Scalable planning and learning for multiagent POMDPs. In AAAI (pp. 1995–2002).
– reference: Nagarajan, P., Warnell, G., & Stone, P. (2018). Deterministic implementations for reproducibility in deep reinforcement learning. arXiv:1809.05676
– reference: RosmanBHawaslyMRamamoorthySBayesian policy reuseMachine Learning2016104199127351328506825503
– reference: CaruanaRMultitask learningMachine Learning199728141752765508
– reference: CritesRHBartoAGElevator group control using multiple reinforcement learning agentsMachine Learning1998332–32352620913.68174
– reference: Stimpson, J. L., & Goodrich, M. A. (2003). Learning to cooperate in a social dilemma: A satisficing approach to bargaining. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 728–735).
– reference: StonePVelosoMMMultiagent systems - a survey from a machine learning perspectiveAutonomous Robots200083345383
– reference: Camerer, C. F., Ho, T. H., & Chong, J. K. (2004). Behavioural game theory: Thinking, learning and teaching. In Advances in understanding strategic behavior (pp. 120–180). New York.
– reference: Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., & Russell, S. (2019). Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In AAAI conference on artificial intelligence.
– reference: Kartal, B., Nunes, E., Godoy, J., & Gini, M. (2016). Monte Carlo tree search with branch and bound for multi-robot task allocation. In The IJCAI-16 workshop on autonomous mobile service robots.
– reference: Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 535–541). ACM.
– reference: Gupta, J. K., Egorov, M., & Kochenderfer, M. J. (2017). Cooperative Multi-agent Control using deep reinforcement learning. In Adaptive learning agents at AAMAS. Sao Paulo.
– reference: Hernandez-Leal, P., Kartal, B., & Taylor, M. E. (2019). Agent modeling as auxiliary task for deep reinforcement learning. In AAAI conference on artificial intelligence and interactive digital entertainment.
– reference: BellmanRA Markovian decision processJournal of Mathematics and Mechanics195765679684918590078.34101
– reference: LittmanMLValue-function reinforcement learning in Markov gamesCognitive Systems Research2001215566
– reference: Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Advances in neural information processing systems (pp. 2244–2252).
– reference: JacobsRAJordanMINowlanSJHintonGEAdaptive mixtures of local expertsNeural Computation1991317987
– reference: Kok, J. R., & Vlassis, N. (2004). Sparse cooperative Q-learning. In Proceedings of the twenty-first international conference on Machine learning (p. 61). ACM.
– reference: BackTEvolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms1996OxfordOxford University Press0877.68060
– reference: GreffKSrivastavaRKKoutnikJSteunebrinkBRSchmidhuberJLSTM: A search space odysseyIEEE Transactions on Neural Networks and Learning Systems20172810222222323709742
– reference: Pérez-Liébana, D., Hofmann, K., Mohanty, S. P., Kuno, N., Kramer, A., Devlin, S., Gaina, R. D., & Ionita, D. (2019). The multi-agent reinforcement learning in Malmö (MARLÖ) competition. CoRR arXiv:1901.08129.
– reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv:1312.5602v1.
– reference: Wunder, M., Littman, M. L., & Babes, M. (2010). Classes of multiagent Q-learning dynamics with epsilon-greedy exploration. In Proceedings of the 35th international conference on machine learning (pp. 1167–1174). Haifa, Israel.
– reference: Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence.
– reference: Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In ICML.
– reference: Zheng, Y., Meng, Z., Hao, J., Zhang, Z., Yang, T., & Fan, C. (2018). A deep bayesian policy reuse approach against non-stationary agents. In Advances in Neural Information Processing Systems (pp. 962–972).
– reference: Kakade, S. M. (2002). A natural policy gradient. In Advances in neural information processing systems (pp. 1531–1538).
– reference: Leibo, J. Z., Zambaldi, V., Lanctot, M., & Marecki, J. (2017). Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th conference on autonomous agents and multiagent systems. Sao Paulo.
– reference: Gao, C., Kartal, B., Hernandez-Leal, P., & Taylor, M. E. (2019). On hard exploration for reinforcement learning: A case study in pommerman. In AAAI conference on artificial intelligence and interactive digital entertainment.
– reference: Lyle, C., Castro, P. S., & Bellemare, M. G. (2019). A comparative analysis of expected and distributional reinforcement learning. In Thirty-third AAAI conference on artificial intelligence.
– reference: Bloembergen, D., Kaisers, M., & Tuyls, K. (2010). Lenient frequency adjusted Q-learning. In Proceedings of the 22nd Belgian/Netherlands artificial intelligence conference.
– reference: Yang, T., Hao, J., Meng, Z., Zhang, C., & Zheng, Y. Z. Z. (2019). Towards efficient detection and optimal response against sophisticated opponents. In IJCAI.
– reference: Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat games. arXiv:1703.10069
– reference: JaderbergMCzarneckiWMDunningIMarrisLLeverGCastañedaAGBeattieCRabinowitzNCMorcosASRudermanASonneratNGreenTDeasonLLeiboJZSilverDHassabisDKavukcuogluKGraepelTHuman-level performance in 3d multiplayer games with population-based reinforcement learningScience20193646443859865344460510.1126/science.aau6249
– reference: Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2018). Unity: A general platform for intelligent agents. CoRR arXiv:1809.02627.
– reference: Neller, T. W., & Lanctot, M. (2013). An introduction to counterfactual regret minimization. In Proceedings of model AI assignments, the fourth symposium on educational advances in artificial intelligence (EAAI-2013).
– reference: He, H., Boyd-Graber, J., Kwok, K., Daume, H. (2016). Opponent modeling in deep reinforcement learning. In 33rd international conference on machine learning (pp. 2675–2684).
– reference: Palmer, G., Savani, R., & Tuyls, K. (2019). Negative update intervals in deep multi-agent reinforcement learning. In 18th International conference on autonomous agents and multiagent systems.
– reference: Stone, P., Kaminka, G., Kraus, S., & Rosenschein, J. S. (2010). Ad Hoc autonomous agent teams: Collaboration without pre-coordination. In 32nd AAAI conference on artificial intelligence (pp. 1504–1509). Atlanta, Georgia, USA.
– reference: AhamedTIBorkarVSJunejaSAdaptive importance sampling technique for markov chains using stochastic approximationOperations Research200654348950422329751167.60343
– reference: Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning.
– reference: Wiering, M., & van Otterlo, M. (Eds.) (2012). Reinforcement learning. Adaptation, learning, and optimization (Vol. 12). Springer-Verlag Berlin Heidelberg.
– reference: Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., & de Freitas, N. (2016). Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224.
– reference: Such, F. P., Madhavan, V., Conti, E., Lehman, J., Stanley, K. O., & Clune, J. (2017). Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR arXiv:1712.06567.
– reference: Littman, M. L., & Stone, P. (2001). Implicit negotiation in repeated games. In ATAL ’01: revised papers from the 8th international workshop on intelligent agents VIII.
– reference: Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient multi-agent deep reinforcement learning. In International conference on autonomous agents and multiagent systems.
– reference: McCloskeyMCohenNJBowerGHCatastrophic interference in connectionist networks: The sequential learning problemPsychology of learning and motivation1989AmsterdamElsevier109165
– reference: Weinberg, M., & Rosenschein, J. S. (2004). Best-response multiagent learning in non-stationary environments. In Proceedings of the 3rd international conference on autonomous agents and multiagent systems (pp. 506–513). New York, NY, USA.
– reference: Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2017). Counterfactual multi-agent policy gradients. In 32nd AAAI conference on artificial intelligence.
– reference: BestGCliffOMPattenTMettuRRFitchRDec-MCTS: Decentralized planning for multi-robot active perceptionThe International Journal of Robotics Research2019382–3316337
– reference: FudenbergDTiroleJGame theory1991CambridgeThe MIT Press1339.91001
– reference: Raileanu, R., Denton, E., Szlam, A., & Fergus, R. (2018). Modeling others using oneself in multi-agent reinforcement learning. In International conference on machine learning.
– reference: TambeMTowards flexible teamworkJournal of Artificial Intelligence Research1997783124
– reference: Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in neural information processing systems (pp. 6379–6390).
– reference: AxelrodRHamiltonWDThe evolution of cooperationScience198121127139013966867471225.92037
– reference: Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2018). Deep reinforcement learning that matters. In 32nd AAAI conference on artificial intelligence.
– reference: Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems, pp. 369–376.
– reference: de Cote, E. M., Lazaric, A., & Restelli, M. (2006). Learning to cooperate in multi-agent social dilemmas. In Proceedings of the 5th international conference on autonomous agents and multiagent systems (pp. 783–785). Hakodate, Hokkaido, Japan.
– reference: Du, Y., Czarnecki, W. M., Jayakumar, S. M., Pascanu, R., & Lakshminarayanan, B. (2018). Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224.
– reference: Andre, D., Friedman, N., & Parr, R. (1998). Generalized prioritized sweeping. In Advances in neural information processing systems (pp. 1001–1007).
– reference: Open AI Five. (2018). [Online]. Retrieved September 7, 2018, https://blog.openai.com/openai-five.
– reference: Foerster, J. N., Assael, Y. M., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Advances in neural information processing systems (pp. 2145–2153).
– reference: OmidshafieiSPapadimitriouCPiliourasGTuylsKRowlandMLespiauJBCzarneckiWMLanctotMPerolatJMunosRα\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-rank: Multi-agent evaluation by evolutionScientific Reports201999937
– reference: Kim, W., Cho, M., & Sung, Y. (2019). Message-dropout: An efficient training method for multi-agent deep reinforcement learning. In 33rd AAAI conference on artificial intelligence.
– reference: MnihVKavukcuogluKSilverDRusuAAVenessJBellemareMGGravesARiedmillerMFidjelandAKOstrovskiGPetersenSBeattieCSadikAAntonoglouIKingHKumaranDWierstraDLeggSHassabisDHuman-level control through deep reinforcement learningNature20155187540529533
– reference: Konda, V. R., & Tsitsiklis, J. (2000). Actor-critic algorithms. In Advances in neural information processing systems.
– reference: SrivastavaNHintonGKrizhevskyASutskeverISalakhutdinovRDropout: a simple way to prevent neural networks from overfittingThe Journal of Machine Learning Research20141511929195832315921318.68153
– reference: BishopCMPattern recognition and machine learning2006BerlinSpringer1107.68072
– reference: Isele, D., & Cosgun, A. (2018). Selective experience replay for lifelong learning. In Thirty-second AAAI conference on artificial intelligence.
– reference: Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C. (2008). Regret minimization in games with incomplete information. In Advances in neural information processing systems (pp. 1729–1736).
– reference: LeCunYBengioYHintonGDeep learningNature20155217553436
– reference: Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv:1708.05866v2.
– reference: MachadoMCBellemareMGTalvitieEVenessJHausknechtMBowlingMRevisiting the arcade learning environment: Evaluation protocols and open problems for general agentsJournal of Artificial Intelligence Research201861523562378603106865755
– reference: GroszBJKrausSCollaborative plans for complex group actionArtificial Intelligence19968622693571420033
– reference: Johanson, M., Zinkevich, M. A., & Bowling, M. (2007). Computing robust counter-strategies. In Advances in neural information processing systems (pp. 721–728). Vancouver, BC, Canada.
– reference: Torrado, R. R., Bontrager, P., Togelius, J., Liu, J., & Perez-Liebana, D. (2018). Deep reinforcement learning for general video game AI. arXiv:1806.02448
– reference: BellemareMGNaddafYVenessJBowlingMThe arcade learning environment: An evaluation platform for general agentsJournal of Artificial Intelligence Research201347253279
– reference: HarsanyiJCGames with incomplete information played by “Bayesian” players, I–III part I. The basic modelManagement Science19671431591822466490207.51102
– reference: Hernandez-Leal, P., Kaisers, M., Baarslag, T., & Munoz de Cote, E. (2017). A survey of learning in multiagent environments—dealing with non-stationarity. arXiv:1707.09183.
– reference: ErnstDGeurtsPWehenkelLTree-based batch mode reinforcement learningJournal of Machine Learning Research20056Apr50355622498301222.68193
– reference: Steckelmacher, D., Roijers, D. M., Harutyunyan, A., Vrancx, P., Plisnier, H., & Nowé, A. (2018). Reinforcement learning in pomdps with memoryless options and option-observation initiation sets. In Thirty-second AAAI conference on artificial intelligence.
– reference: Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., & Kautz, J. (2017). Reinforcement learning through asynchronous advantage actor-critic on a GPU. In International conference on learning representations.
– reference: Agogino, A. K., & Tumer, K. (2004). Unifying temporal and structural credit assignment problems. In Proceedings of 17th international conference on autonomous agents and multiagent systems.
– reference: ShammaJSArslanGDynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibriaIEEE Transactions on Automatic Control200550331232721230931366.91028
– reference: Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems (pp. 1471–1479).
– reference: Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. In Thirtieth AAAI conference on artificial intelligence.
– reference: Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310–1318).
– reference: CamererCFHoTHChongJKA cognitive hierarchy model of gamesThe Quarterly Journal of Economics200411938611074.91503
– reference: TanMingMulti-Agent Reinforcement Learning: Independent vs. Cooperative AgentsMachine Learning Proceedings 19931993330337
– reference: RosenthalRThe file drawer problem and tolerance for null resultsPsychological Bulletin1979863638
– reference: Suau de Castro, M., Congeduti, E., Starre, R. A., Czechowski, A., & Oliehoek, F. A. (2019). Influence-based abstraction in deep reinforcement learning. In Adaptive, learning agents workshop.
– reference: van Hasselt, H., Doron, Y., Strub, F., Hessel, M., Sonnerat, N., & Modayil, J. (2018). Deep reinforcement learning and the deadly triad. CoRR arXiv:1812.02648.
– reference: Hong, Z. W., Su, S. Y., Shann, T. Y., Chang, Y. H., & Lee, C. Y. (2018). A deep policy inference Q-network for multi-agent systems. In International conference on autonomous agents and multiagent systems.
– reference: Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the 19th international joint conference on artificial intelligence (pp. 817–822). Edinburg, Scotland, UK.
– reference: Foerster, J. N., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning.
– reference: BrownNSandholmTSuperhuman AI for heads-up no-limit poker: Libratus beats top professionalsScience2018359637441842437514621415.68163
– reference: Collaboration & Credit Principles, How can we be good stewards of collaborative trust? (2019). [Online]. Retrieved May 31, 2019, http://colah.github.io/posts/2019-05-Collaboration/index.html.
– reference: Mordatch, I., & Abbeel, P. (2018). Emergence of grounded compositional language in multi-agent populations. In Thirty-second AAAI conference on artificial intelligence.
– reference: Ortega, P. A., & Legg, S. (2018). Modeling friends and foes. arXiv:1807.00196
– reference: BrowneCBPowleyEWhitehouseDLucasSMCowlingPIRohlfshagenPTavenerSPerezDSamothrakisSColtonSA survey of Monte Carlo tree search methodsIEEE Transactions on Computational Intelligence and AI in Games201241143
– reference: Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Deep learning and representation learning workshop.
– reference: Do I really have to cite an arXiv paper? (2017). [Online]. Retrieved May 21, 2019, http://approximatelycorrect.com/2017/08/01/do-i-have-to-cite-arxiv-paper/.
– reference: Zhang, C., & Lesser, V. (2010). Multi-agent learning with policy prediction. In Twenty-fourth AAAI conference on artificial intelligence.
– reference: Johanson, M., Bard, N., Burch, N., & Bowling, M. (2012). Finding optimal abstract strategies in extensive-form games. In Twenty-sixth AAAI conference on artificial intelligence.
– reference: Omidshafiei, S., Hennes, D., Morrill, D., Munos, R., Perolat, J., Lanctot, M., Gruslys, A., Lespiau, J. B., & Tuyls, K. (2019). Neural replicator dynamics. arXiv e-prints arXiv:1906.00190.
– reference: Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the seventeenth international conference on machine learning.
– reference: Deep reinforcement learning: Pong from pixels. (2016). [Online]. Retrieved May 7, 2019, https://karpathy.github.io/2016/05/31/rl/.
– reference: Johanson, M., Waugh, K., Bowling, M., & Zinkevich, M. (2011). Accelerating best response calculation in large extensive games. In Twenty-second international joint conference on artificial intelligence.
– reference: Lazaridou, A., Peysakhovich, A., & Baroni, M. (2017). Multi-agent cooperation and the emergence of (natural) language. In International conference on learning representations.
– reference: Castellini, J., Oliehoek, F. A., Savani, R., & Whiteson, S. (2019). The representational capacity of action-value networks for multi-agent reinforcement learning. In 18th International conference on autonomous agents and multiagent systems.
– reference: Kulkarni, T. D., Narasimhan, K., Saeedi, A., & Tenenbaum, J. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in neural information processing systems (pp. 3675–3683).
– reference: ShohamYPowersRGrenagerTIf multi-agent learning is the answer, what is the question?Artificial Intelligence2007171736537723322841168.68493
– reference: Todorov, E., Erez, T., & Tassa, Y. (2012). MuJoCo - A physics engine for model-based control. In Intelligent robots and systems( pp. 5026–5033).
– reference: BloembergenDTuylsKHennesDKaisersMEvolutionary dynamics of multi-agent learning: A surveyJournal of Artificial Intelligence Research20155365969733895661336.68210
– reference: Azizzadenesheli, K. (2019). Maybe a few considerations in reinforcement learning research? In Reinforcement learning for real life workshop.
– reference: Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456).
– reference: Bono, G., Dibangoye, J. S., Matignon, L., Pereyron, F., & Simonin, O. (2018). Cooperative multi-agent policy gradient. In European conference on machine learning.
– reference: Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., & Silver, D. (2019). AlphaStar: Mastering the real-time strategy game StarCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/
– reference: Iba, H. (1996). Emergent cooperation for multiple agents using genetic programming. In International conference on parallel problem solving from nature (pp. 32–41). Springer.
– reference: Song, Y., Wang, J., Lukasiewicz, T., Xu, Z., Xu, M., Ding, Z., & Wu, L. (2019). Arena: A general evaluation platform and building toolkit for multi-agent intelligence. CoRR arXiv:1905.08085.
– reference: Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In Advances in neural information processing systems (pp. 1523–1530).
– reference: Leibo, J. Z., Hughes, E., Lanctot, M., & Graepel, T. (2019). Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research. CoRR arXiv:1903.00742.
– reference: BowlingMConvergence and no-regret in multiagent learningAdvances in neural information processing systems2004CanadaVancouver209216
– reference: Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 1352–1361).
– reference: Hernandez-Leal, P., & Kaisers, M. (2017). Learning against sequential opponents in repeated stochastic games. In The 3rd multi-disciplinary conference on reinforcement learning and decision making. Ann Arbor.
– reference: BrownGWIterative solution of games by fictitious playActivity Analysis of Production and Allocation1951131374376562650045.09902
– reference: Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C., Torr, P. H. S., Foerster, J. N., & Whiteson, S. (2019). The StarCraft multi-agent challenge. CoRR arXiv:1902.04043.
– reference: DarwicheAHuman-level intelligence or animal-like abilities?Communications of the ACM201861105667
– reference: SinghSJaakkolaTLittmanMLSzepesváriCConvergence results for single-step on-policy reinforcement-learning algorithmsMachine Learning20003832873080954.68127
– reference: TsitsiklisJAsynchronous stochastic approximation and Q-learningMachine Learning19941631852020820.68105
– reference: Forde, J. Z., & Paganini, M. (2019). The scientific method in the science of machine learning. In ICLR debugging machine learning models workshop.
– reference: Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI gym. arXiv preprint arXiv:1606.01540.
– reference: Rashid, T., Samvelyan, M., de Witt, C. S., Farquhar, G., Foerster, J. N., & Whiteson, S. (2018). QMIX - monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning.
– reference: Rusu, A. A., Colmenarejo, S. G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., & Hadsell, R. (2016). Policy distillation. In International conference on learning representations.
– reference: Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning.
– reference: Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. In Proceedings of the 25th international conference on Machine learning (pp. 664–671). ACM.
– reference: Riedmiller, M. (2005). Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In European conference on machine learning (pp. 317–328). Springer.
– reference: Hausknecht, M., & Stone, P. (2015). Deep recurrent Q-learning for partially observable MDPs. In International conference on learning representations.
– reference: GuptaJKEgorovMKochenderferMSukthankarGRodriguez-AguilarJACooperative multi-agent control using deep reinforcement learningAutonomous agents and multiagent systems2017ChamSpringer6683
– reference: TuylsKWeissGMultiagent learning: Basics, challenges, and prospectsAI Magazine20123334152
– reference: CarmelDMarkovitchSIncorporating opponent models into adversary searchAAAI/IAAI19961120125
– reference: Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2018). Learning with opponent-learning awareness. In Proceedings of 17th international conference on autonomous agents and multiagent systems. Stockholm, Sweden.
– reference: Srinivasan, S., Lanctot, M., Zambaldi, V., Pérolat, J., Tuyls, K., Munos, R., & Bowling, M. (2018). Actor-critic policy optimization in partially observable multiagent environments. In Advances in neural information processing systems (pp. 3422–3435).
– reference: de WeerdHVerbruggeRVerheijBHow much does it help to know what she knows you know? An agent-based simulation studyArtificial Intelligence2013199–200C679230795661284.68567
– reference: BowlingMBurchNJohansonMTammelinOHeads-up limit hold’em poker is solvedScience20153476218145149
– reference: Walsh, W. E., Das, R., Tesauro, G., & Kephart, J. O. (2002). Analyzing complex strategic interactions in multi-agent systems. In AAAI-02 workshop on game-theoretic and decision-theoretic agents (pp. 109–118).
– reference: Papoudakis, G., Christianos, F., Rahman, A., & Albrecht, S. V. (2019). Dealing with non-stationarity in multi-agent deep reinforcement learning. arXiv preprint arXiv:1906.04737.
– reference: Schulman, J., Abbeel, P., & Chen, X. (2017) Equivalence between policy gradients and soft Q-learning. CoRR arXiv:1704.06440.
– reference: LinLJSelf-improving reactive agents based on reinforcement learning, planning and teachingMachine Learning199283–4293321
– reference: Suddarth, S. C., & Kergosien, Y. (1990). Rule-injection hints as a means of improving network performance and learning time. In Neural networks (pp. 120–129). Springer.
– reference: CrandallJWGoodrichMALearning to compete, coordinate, and cooperate in repeated games using reinforcement learningMachine Learning201182328131431081951237.68142
– reference: SamothrakisSLucasSRunarssonTRoblesDCoevolving game-playing agents: Measuring performance and intransitivitiesIEEE Transactions on Evolutionary Computation2013172213226
– reference: Zahavy, T., Ben-Zrihem, N., & Mannor, S. (2016). Graying the black box: Understanding DQNs. In International conference on machine learning (pp. 1899–1908).
– reference: Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. In International conference on machine learning.
– reference: Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfitting in empirical reinforcement learning. In 2011 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL) (pp. 120–127). IEEE.
– reference: Hernandez-Leal, P., Taylor, M. E., Rosman, B., Sucar, L. E., & Munoz de Cote, E. (2016). Identifying and tracking switching, non-stationary opponents: A Bayesian approach. In Multiagent interaction without prior coordination workshop at AAAI. Phoenix, AZ, USA.
– reference: AstromKJOptimal control of Markov processes with incomplete state informationJournal of Mathematical Analysis and Applications19651011742051735700137.35803
– reference: Cassandra, A. R. (1998). Exact and approximate algorithms for partially observable Markov decision processes. Ph.D. thesis, Computer Science Department, Brown University.
– reference: Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems.
– reference: Capture the Flag: The emergence of complex cooperative agents. (2018). [Online]. Retrieved September 7, 2018, https://deepmind.com/blog/capture-the-flag/ .
– reference: GmytrasiewiczPJDoshiPA framework for sequential planning in multiagent settingsJournal of Artificial Intelligence Research200524149791080.68664
– reference: Kartal, B., Hernandez-Leal, P., & Taylor, M. E. (2019). Using Monte Carlo tree search as a demonstrator within asynchronous deep RL. In AAAI workshop on reinforcement learning in games.
– reference: SandholmTWCritesRHMultiagent reinforcement learning in the iterated prisoner’s dilemmaBiosystems1996371–2147166
– reference: SchusterMPaliwalKKBidirectional recurrent neural networksIEEE Transactions on Signal Processing1997451126732681
– reference: WeiELukeSLenient learning in independent-learner stochastic cooperative gamesJournal of Machine Learning Research20161714235171071360.68720
– reference: Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 2817–2826). JMLR. org
– reference: Herbrich, R., Minka, T., & Graepel, T. (2007). TrueSkillTM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{{\rm TM}}$$\end{document}: a Bayesian skill rating system. In Advances in neural information processing systems (pp. 569–576).
– reference: Clary, K., Tosch, E., Foley, J., & Jensen, D. (2018). Let’s play again: Variability of deep reinforcement learning agents in Atari environments. In NeurIPS critiquing and correcting trends workshop.
– reference: OliehoekFAAmatoCA concise introduction to decentralized POMDPs2016BerlinSpringer1355.68005
– reference: VodopivecTSamothrakisSSterBOn Monte Carlo tree search and reinforcement learningJournal of Artificial Intelligence Research201760881936374205406825260
– reference: Wei, E., Wicke, D., Freelan, D., & Luke, S. (2018). Multiagent soft Q-learning. arXiv:1804.09817
– reference: AgoginoAKTumerKAnalyzing and visualizing multiagent rewards in dynamic and stochastic domainsAutonomous Agents and Multi-Agent Systems2008172320338
– reference: Liu, H., Feng, Y., Mao, Y., Zhou, D., Peng, J., & Liu, Q. (2018). Action-depedent control variates for policy optimization via stein’s identity. In International conference on learning representations.
– reference: Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., & Graepel, T. (2018). The mechanics of n-player differentiable games. In Proceedings of the 35th international conference on machine learning, proceedings of machine learning research (pp. 354–363). Stockholm, Sweden.
– reference: StrehlALLittmanMLAn analysis of model-based interval estimation for Markov decision processesJournal of Computer and System Sciences20087481309133124602871157.68059
– reference: WhitesonSStonePEvolutionary function approximation for reinforcement learningJournal of Machine Learning Research20067May87791722743901222.68330
– reference: Azizzadenesheli, K., Yang, B., Liu, W., Brunskill, E., Lipton, Z., & Anandkumar, A. (2018). Surprising negative results for generative adversarial tree search. In Critiquing and correcting trends in machine learning workshop.
– reference: TampuuAMatiisenTKodeljaDKuzovkinIKorjusKAruJAruJVicenteRMultiagent cooperation and competition with deep reinforcement learningPLoS ONE2017124e0172395
– reference: Fulda, N., & Ventura, D. (2007). Predicting and preventing coordination problems in cooperative Q-learning systems. In Proceedings of the twentieth international joint conference on artificial intelligence (pp. 780–785). Hyderabad, India.
– reference: GmytrasiewiczPJDurfeeEHRational coordination in multi-agent environmentsAutonomous Agents and Multi-Agent Systems200034319350
– reference: SchmidhuberJDeep learning in neural networks: An overviewNeural Networks20156185117
– reference: Bowling, M. (2000). Convergence problems of general-sum multiagent reinforcement learning. In International conference on machine learning (pp. 89–94).
– reference: Oliehoek, F. A., Witwicki, S. J., & Kaelbling, L. P. (2012). Influence-based abstraction for multiagent systems. In Twenty-sixth AAAI conference on artificial intelligence.
– reference: NowéAVrancxPDe HauwereYMWieringMvan OtterloMGame theory and multi-agent reinforcement learningReinforcement learning2012BerlinSpringer4414701216.68229
– reference: Amodei, D., & Hernandez, D. (2018). AI and compute. https://blog.openai.com/ai-and-compute.
– reference: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.
– reference: Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., & Moritz, P. (2015). Trust region policy optimization. In 31st international conference on machine learning. Lille, France.
– reference: Tesauro, G. (2003). Extending Q-learning to general adaptive multi-agent systems. In Advances in neural information processing systems (pp. 871–878). Vancouver, Canada.
– reference: Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 541–548). Morgan Kaufmann Publishers Inc.
– reference: Pyeatt, L. D., Howe, A. E., et al. (2001). Decision tree function approximation in reinforcement learning. In Proceedings of the third international symposium on adaptive systems: Evolutionary computation and probabilistic graphical models (Vol. 2, pp. 70–77). Cuba.
– reference: Suarez, J., Du, Y., Isola, P., & Mordatch, I. (2019). Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. CoRR arXiv:1903.00784.
– reference: BeckerRZilbersteinSLesserVGoldmanCVSolving transition independent decentralized Markov decision processesJournal of Artificial Intelligence Research20042242345521294741080.68655
– reference: Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121.
– reference: Precup, D., Sutton, R. S., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the seventeenth international conference on machine learning.
– reference: Gao, C., Hernandez-Leal, P., Kartal, B., & Taylor, M. E. (2019). Skynet: A top deep RL agent in the inaugural pommerman team competition. In 4th multidisciplinary conference on reinforcement learning and decision making.
– reference: Guestrin, C., Lagoudakis, M., & Parr, R. (2002). Coordinated reinforcement learning. In ICML (Vol. 2, pp. 227–234).
– reference: Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In Proceedings of learning, inference and control of multi-agent systems at NIPS.
– reference: Gu, S. S., Lillicrap, T., Turner, R. E., Ghahramani, Z., Schölkopf, B., & Levine, S. (2017). Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. In Advances in neural information processing systems (pp. 3846–3855).
– reference: Meuleau, N., Peshkin, L., Kim, K. E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 427–436).
– reference: Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937).
– reference: Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., & Kavukcuoglu, K. (2017). FeUdal networks for hierarchical reinforcement learning. In International conference on machine learning.
– reference: Lerer, A., & Peysakhovich, A. (2017). Maintaining cooperation in complex social dilemmas using deep reinforcement learning. CoRR arXiv:1707.01068.
– reference: Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems (pp. 901–909).
– reference: Song, X., Wang, T., & Zhang, C. (2019). Convergence of multi-agent learning with a finite step size in general-sum games. In 18th International conference on autonomous agents and multiagent systems.
– reference: SpencerJRandomization, derandomization and antirandomization: three gamesTheoretical Computer Science1994131241542912889480805.90121
– reference: Gordon, G. J. (1999). Approximate solutions to Markov decision processes. Technical report, Carnegie-Mellon University.
– reference: Lipton, Z. C., Azizzadenesheli, K., Kumar, A., Li, L., Gao, J., & Deng, L. (2018). Combating reinforcement learning’s Sisyphean curse with intrinsic fear. arXiv:1611.01211v8.
– reference: Wolpert, D. H., & Tumer, K. (2002). Optimal payoff functions for members of collectives. In Modeling complexity in economic and social systems (pp. 355–369).
– reference: HuJWellmanMPNash Q-learning for general-sum stochastic gamesThe Journal of Machine Learning Research200341039106921253451094.68076
– reference: Gu, S., Lillicrap, T., Ghahramani, Z., Turner, R. E., & Levine, S. (2017). Q-prop: Sample-efficient policy gradient with an off-policy critic. In International conference on learning representations.
– reference: Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., & Clune, J. (2019). Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995.
– reference: Johnson, M., Hofmann, K., Hutton, T., & Bignell, D. (2016). The Malmo platform for artificial intelligence experimentation. In IJCAI (pp. 4246–4247).
– reference: Lanctot, M., Zambaldi, V. F., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., Silver, D., & Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. In Advances in neural information processing systems.
– reference: Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations.
– reference: Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NIPS deep learning workshop.
– reference: SilverDHuangAMaddisonCJGuezASifreLvan den DriesscheGSchrittwieserJAntonoglouIPanneershelvamVLanctotMDielemanSGreweDNhamJKalchbrennerNSutskeverILillicrapTLeachMKavukcuogluKGraepelTHassabisDMastering the game of go with deep neural networks and tree searchNature20165297587484489
– reference: EloAEThe rating of chessplayers, past and present1978NagoyaArco Pub.
– reference: Wang, H., Raj, B., & Xing, E. P. (2017). On the origin of deep learning. CoRR arXiv:1702.07800.
– reference: Panait, L., Sullivan, K., & Luke, S. (2006). Lenience towards teammates helps in cooperative multiagent learning. In Proceedings of the 5th international conference on autonomous agents and multiagent systems. Hakodate, Japan.
– reference: HochreiterSSchmidhuberJLong short-term memoryNeural Computation19979817351780
– reference: BowlingMVelosoMMultiagent learning using a variable learning rateArtificial Intelligence2002136221525018958190995.68075
– reference: Lin, L. J. (1991). Programming robots using reinforcement learning and teaching. In AAAI (pp. 781–786).
– reference: KalaiELehrerERational learning leads to Nash equilibriumEconometrica: Journal of the Econometric Society1993611019104512347920793.90106
– reference: Leibo, J. Z., Perolat, J., Hughes, E., Wheelwright, S., Marblestone, A. H., Duéñez-Guzmán, E., Sunehag, P., Dunning, I., & Graepel, T. (2019). Malthusian reinforcement learning. In 18th international conference on autonomous agents and multiagent systems.
– reference: Costa GomesMCrawfordVPBrosetaBCognition and behavior in normal-form games: An experimental studyEconometrica200169511931235
– reference: OpenAI Baselines: ACKTR & A2C. (2017). [Online]. Retrieved April 29, 2019, https://openai.com/blog/baselines-acktr-a2c/ .
– reference: Barrett, S., Stone, P., Kraus, S., & Rosenfeld, A. (2013). Teamwork with Limited Knowledge of Teammates. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 102–108. Bellevue, WS, USA.
– reference: Liu, S., Lever, G., Merel, J., Tunyasuvunakool, S., Heess, N., & Graepel, T. (2019). Emergent coordination through competition. In International conference on learning representations.
– reference: Lipton, Z. C., & Steinhardt, J. (2018). Troubling trends in machine learning scholarship. In ICML Machine Learning Debates workshop.
– reference: SzepesváriCLittmanMLA unified analysis of value-function-based reinforcement-learning algorithmsNeural Computation199911820172060
– reference: Tamar, A., Levine, S., Abbeel, P., Wu, Y., & Thomas, G. (2016). Value iteration networks. In NIPS (pp. 2154–2162).
– reference: LaurentGJMatignonLFort-PiatLThe world of independent learners is not MarkovianInternational Journal of Knowledge-based and Intelligent Engineering Systems20111515564
– reference: Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In International conference on learning representations.
– reference: OliehoekFASpaanMTVlassisNOptimal and approximate Q-value functions for decentralized POMDPsJournal of Artificial Intelligence Research20083228935324207381182.68261
– reference: Ilyas, A., Engstrom, L., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., & Madry, A. (2018). Are deep policy gradient algorithms truly policy gradient algorithms? CoRR arXiv:1811.02553.
– reference: SzepesváriCAlgorithms for reinforcement learningSynthesis Lectures on Artificial Intelligence and Machine Learning20104111031205.68320
– reference: De JongKAEvolutionary computation: A unified approach2006CambridgeMIT press1106.68093
– reference: Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 489–496). ACM.
– reference: Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S. M. A., Riedmiller, M. A., & Silver, D. (2017). Emergence of locomotion behaviours in rich environments. arXiv:1707.02286v2
– reference: Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., & Dunning, I., et al. (2018). IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In International conference on machine learning.
– reference: Hasselt, H. V. (2010). Double Q-learning. In Advances in neural information processing systems (pp. 2613–2621).
– reference: MondererDShapleyLSFictitious play property for games with identical interestsJournal of Economic Theory199668125826513724000849.90130
– reference: Pesce, E., & Montana, G. (2019). Improving coordination in multi-agent deep reinforcement learning through memory-driven communication. CoRR arXiv:1901.03887.
– reference: PowersRShohamYVuTA general criterion and an algorithmic framework for learning in multi-agent systemsMachine Learning2007671–24576
– reference: Damer, S., & Gini, M. (2017). Safely using predictions in general-sum normal form games. In Proceedings of the 16th conference on autonomous agents and multiagent systems. Sao Paulo.
– reference: HauskrechtMValue-function approximations for partially observable Markov decision processesJournal of Artificial Intelligence Research2000131339417818620946.68131
– reference: MooreAWAtkesonCGPrioritized sweeping: Reinforcement learning with less data and less timeMachine Learning1993131103130
– reference: ErdösPSelfridgeJLOn a combinatorial gameJournal of Combinatorial Theory, Series A19731432983013273130293.05004
– reference: SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature20175507676354
– reference: Albrecht, S. V., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In Proceedings of the 12th international conference on autonomous agents and multi-agent systems. Saint Paul, MN, USA.
– reference: Multiagent Learning, Foundations and Recent Trends. (2017). [Online]. Retrieved September 7, 2018, https://www.cs.utexas.edu/~larg/ijcai17_tutorial/multiagent_learning.pdf .
– reference: Resnick, C., Eldridge, W., Ha, D., Britz, D., Foerster, J., Togelius, J., Cho, K., & Bruna, J. (2018). Pommerman: A multi-agent playground. arXiv:1809.07124.
– reference: Watkins, J. (1989). Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge, UK
– reference: SilvaFLCostaAHRA survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research201964645703393255907037594
– reference: WilliamsRJSimple statistical gradient-following algorithms for connectionist reinforcement learningMachine Learning199283–42292560772.68076
– reference: Raghu, M., Irpan, A., Andreas, J., Kleinberg, R., Le, Q., & Kleinberg, J. (2018). Can deep reinforcement learning solve Erdos–Selfridge-spencer games? In Proceedings of the 35th international conference on machine learning.
– reference: ConitzerVSandholmTAWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponentsMachine Learning2006671–22343
– reference: Bowling, M., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. Proceedings of the nineteenth conference on artificial intelligence (Vol. 5, pp. 53–58).
– reference: Ling, C. K., Fang, F., & Kolter, J. Z. (2018). What game are we playing? End-to-end learning in normal and extensive form games. In Twenty-seventh international joint conference on artificial intelligence.
– reference: Devlin, S., Yliniemi, L. M., Kudenko, D., & Tumer, K. (2014). Potential-based difference rewards for multiagent reinforcement learning. In 13th International conference on autonomous agents and multiagent systems, AAMAS 2014. Paris, France.
– reference: Firoiu, V., Whitney, W. F., & Tenenbaum, J. B. (2017). Beating the World’s best at super smash Bros. with deep reinforcement learning. CoRR arXiv:1702.06230.
– reference: Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (pp. 703–710)
– reference: Zheng, Y., Hao, J., & Zhang, Z. (2018). Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. arXiv:1802.08534.
– reference: Cuccu, G., Togelius, J., & Cudré-Mauroux, P. (2019). Playing Atari with six neurons. In Proceedings of the 18th international conference on autonomous agents and multiagent systems (pp. 998–1006). International Foundation for Autonomous Agents and Multiagent Systems.
– reference: Grover, A., Al-Shedivat, M., Gupta, J. K., Burda, Y., & Edwards, H. (2018). Learning policy representations in multiagent systems. In International conference on machine learning.
– reference: Banerjee, B., & Peng, J. (2003). Adaptive policy gradient in multiagent learning. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems (pp. 686–692). ACM.
– reference: Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2017). Reinforcement learning with unsupervised auxiliary tasks. In International conference on learning representations.
– reference: Kaisers, M., & Tuyls, K. (2011). FAQ-learning in matrix games: demonstrating convergence near Nash equilibria, and bifurcation of attractors in the battle of sexes. In AAAI Workshop on Interactive Decision Theory and Game Theory (pp. 309–316). San Francisco, CA, USA.
– reference: TesauroGTemporal difference learning and TD-GammonCommunications of the ACM19953835868
– reference: Yu, Y. (2018). Towards sample efficient reinforcement learning. In IJCAI (pp. 5739–5743).
– reference: Schmidhuber, J. (1991). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the international conference on simulation of adaptive behavior: From animals to animats (pp. 222–227).
– reference: Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In Advances in neural information processing systems (pp. 271–278).
– reference: Kamihigashi, T., & Le Van, C. (2015). Necessary and sufficient conditions for a solution of the bellman equation to be the value function: A general principle. https://halshs.archives-ouvertes.fr/halshs-01159177
– reference: ChakrabortyDStonePMultiagent learning in the presence of memory-bounded agentsAutonomous Agents and Multi-Agent Systems2013282182213
– reference: Bacchiani, G., Molinari, D., & Patander, M. (2019). Microscopic traffic simulation by cooperative multi-agent deep reinforcement learning. In AAMAS.
– reference: Bull, L. (1998). Evolutionary computing in multi-agent environments: Operators. In International conference on evolutionary programming (pp. 43–52). Springer.
– reference: Omidshafiei, S., Pazis, J., Amato, C., How, J. P., & Vian, J. (2017). Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th international conference on machine learning. Sydney.
– reference: MahadevanSConnellJAutomatic programming of behavior-based robots using reinforcement learningArtificial Intelligence1992552–3311365
– reference: Rabinowitz, N. C., Perbet, F., Song, H. F., Zhang, C., Eslami, S. M. A., & Botvinick, M. (2018). Machine theory of mind. In International conference on machine learning. Stockholm, Sweden.
– reference: Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th national conference on artificial intelligence (pp. 746–752). Madison, Wisconsin, USA.
– reference: NashJFEquilibrium points in n-person gamesProceedings of the National Academy of Sciences19503614849317010036.01104
– reference: Bellemare, M. G., Dabney, W., Dadashi, R., Taïga, A. A., Castro, P. S., & Roux, N. L., et al. (2019). A geometric perspective on optimal representations for reinforcement learning. CoRR arXiv:1901.11530.
– reference: MoriartyDESchultzACGrefenstetteJJEvolutionary algorithms for reinforcement learningJournal of Artificial Intelligence Research1999112412760924.68157
– reference: PutermanMLMarkov decision processes: Discrete stochastic dynamic programming1994New YorkWiley0829.90134
– reference: Shelhamer, E., Mahmoudieh, P., Argus, M., & Darrell, T. (2017). Loss is its own reward: Self-supervision for reinforcement learning. In ICLR workshops.
– reference: Castro, P. S., Moitra, S., Gelada, C., Kumar, S., Bellemare, M. G. (2018). Dopamine: A research framework for deep reinforcement learning. arXiv:1812.06110.
– reference: MatignonLLaurentGJLe Fort-PiatNIndependent reinforcement learners in cooperative Markov games: A survey regarding coordination problemsKnowledge Engineering Review2012271131
– reference: Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the sixteenth international conference on machine learning (pp. 278–287).
– reference: Schmidhuber, J. (2015). Critique of Paper by “Deep Learning Conspiracy” (Nature 521 p 436). http://people.idsia.ch/~juergen/deep-learning-conspiracy.html.
– reference: Even-DarEMansourYLearning rates for Q-learningJournal of Machine Learning Research20035Dec12522479721222.68196
– reference: BartoAGMirolliMBaldassarreGIntrinsic motivation and reinforcement learningIntrinsically motivated learning in natural and artificial systems2013BerlinSpringer1747
– reference: Oliehoek, F. A. (2018). Interactive learning and decision making - foundations, insights & challenges. In International joint conference on artificial intelligence.
– reference: Pérolat, J., Piot, B., & Pietquin, O. (2018). Actor-critic fictitious play in simultaneous move multistage games. In 21st international conference on artificial intelligence and statistics.
– reference: Gullapalli, V., & Barto, A. G. (1992). Shaping as a method for accelerating reinforcement learning. In Proceedings of the 1992 IEEE international symposium on intelligent control (pp. 554–559). IEEE.
– reference: Li, Y. (2017). Deep reinforcement learning: An overview. CoRR arXiv:1701.07274.
– volume: 28
  start-page: 2222
  issue: 10
  year: 2017
  ident: 9421_CR114
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
  doi: 10.1109/TNNLS.2016.2582924
– volume-title: Theory of games and economic behavior
  year: 1945
  ident: 9421_CR341
– volume: 55
  start-page: 311
  issue: 2–3
  year: 1992
  ident: 9421_CR212
  publication-title: Artificial Intelligence
  doi: 10.1016/0004-3702(92)90058-6
– ident: 9421_CR302
  doi: 10.1609/aaai.v32i1.11606
– volume: 7
  start-page: 877
  issue: May
  year: 2006
  ident: 9421_CR351
  publication-title: Journal of Machine Learning Research
– ident: 9421_CR219
– ident: 9421_CR25
– ident: 9421_CR48
– ident: 9421_CR168
– ident: 9421_CR192
– ident: 9421_CR217
  doi: 10.1145/1390156.1390240
– ident: 9421_CR202
– ident: 9421_CR334
– ident: 9421_CR139
– ident: 9421_CR337
  doi: 10.1109/ADPRL.2009.4927542
– ident: 9421_CR357
– volume: 50
  start-page: 312
  issue: 3
  year: 2005
  ident: 9421_CR287
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/TAC.2005.843878
– ident: 9421_CR19
– ident: 9421_CR116
– ident: 9421_CR157
– volume: 86
  start-page: 638
  issue: 3
  year: 1979
  ident: 9421_CR270
  publication-title: Psychological Bulletin
  doi: 10.1037/0033-2909.86.3.638
– volume: 53
  start-page: 659
  year: 2015
  ident: 9421_CR38
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.4818
– ident: 9421_CR111
– ident: 9421_CR220
– volume: 9
  start-page: 9937
  year: 2019
  ident: 9421_CR243
  publication-title: Scientific Reports
  doi: 10.1038/s41598-019-45619-9
– ident: 9421_CR88
– volume: 3
  start-page: 213
  issue: Oct
  year: 2002
  ident: 9421_CR47
  publication-title: Journal of Machine Learning Research
– ident: 9421_CR163
– volume: 74
  start-page: 1309
  issue: 8
  year: 2008
  ident: 9421_CR307
  publication-title: Journal of Computer and System Sciences
  doi: 10.1016/j.jcss.2007.08.009
– volume: 2
  start-page: 55
  issue: 1
  year: 2001
  ident: 9421_CR200
  publication-title: Cognitive Systems Research
  doi: 10.1016/S1389-0417(01)00015-8
– volume-title: Markov decision processes: Discrete stochastic dynamic programming
  year: 1994
  ident: 9421_CR261
  doi: 10.1002/9780470316887
– ident: 9421_CR105
– ident: 9421_CR180
– volume: 15
  start-page: 1929
  issue: 1
  year: 2014
  ident: 9421_CR301
  publication-title: The Journal of Machine Learning Research
– ident: 9421_CR327
  doi: 10.1109/IROS.2012.6386109
– ident: 9421_CR207
– ident: 9421_CR231
– volume: 15
  start-page: 55
  issue: 1
  year: 2011
  ident: 9421_CR182
  publication-title: International Journal of Knowledge-based and Intelligent Engineering Systems
  doi: 10.3233/KES-2010-0206
– ident: 9421_CR316
– volume: 6
  start-page: 503
  issue: Apr
  year: 2005
  ident: 9421_CR92
  publication-title: Journal of Machine Learning Research
– volume-title: Game theory
  year: 1991
  ident: 9421_CR103
– start-page: 183
  volume-title: Innovations in multi-agent systems and applications - 1
  year: 2010
  ident: 9421_CR56
  doi: 10.1007/978-3-642-14435-6_7
– volume: 19
  start-page: 399
  year: 2003
  ident: 9421_CR120
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.1000
– start-page: 109
  volume-title: Psychology of learning and motivation
  year: 1989
  ident: 9421_CR214
– volume: 19
  start-page: 347
  issue: 1
  year: 2018
  ident: 9421_CR83
  publication-title: The Journal of Machine Learning Research
– volume: 67
  start-page: 45
  issue: 1–2
  year: 2007
  ident: 9421_CR259
  publication-title: Machine Learning
  doi: 10.1007/s10994-006-9643-2
– volume: 6
  start-page: 679
  issue: 5
  year: 1957
  ident: 9421_CR33
  publication-title: Journal of Mathematics and Mechanics
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: 9421_CR221
  publication-title: Nature
  doi: 10.1038/nature14236
– volume: 84
  start-page: 137
  issue: 1–2
  year: 2011
  ident: 9421_CR128
  publication-title: Machine Learning
  doi: 10.1007/s10994-011-5235-x
– volume: 4
  start-page: 1
  issue: 1
  year: 2010
  ident: 9421_CR318
  publication-title: Synthesis Lectures on Artificial Intelligence and Machine Learning
  doi: 10.2200/S00268ED1V01Y201005AIM009
– ident: 9421_CR31
– ident: 9421_CR266
– ident: 9421_CR283
– ident: 9421_CR37
– ident: 9421_CR242
– ident: 9421_CR185
– volume: 521
  start-page: 436
  issue: 7553
  year: 2015
  ident: 9421_CR184
  publication-title: Nature
  doi: 10.1038/nature14539
– volume: 1995
  start-page: 30
  year: 1995
  ident: 9421_CR21
  publication-title: Machine Learning Proceedings
– ident: 9421_CR368
– ident: 9421_CR162
– ident: 9421_CR255
– ident: 9421_CR358
  doi: 10.24963/ijcai.2019/88
– ident: 9421_CR195
  doi: 10.24963/ijcai.2018/55
– ident: 9421_CR100
– ident: 9421_CR99
– volume-title: Evolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms
  year: 1996
  ident: 9421_CR20
  doi: 10.1093/oso/9780195099713.001.0001
– ident: 9421_CR154
  doi: 10.1162/neco.1994.6.6.1185
– ident: 9421_CR284
– volume: 33
  start-page: 41
  issue: 3
  year: 2012
  ident: 9421_CR333
  publication-title: AI Magazine
  doi: 10.1609/aimag.v33i3.2426
– volume: 5
  start-page: 1
  issue: Dec
  year: 2003
  ident: 9421_CR94
  publication-title: Journal of Machine Learning Research
– volume: 119
  start-page: 861
  issue: 3
  year: 2004
  ident: 9421_CR59
  publication-title: The Quarterly Journal of Economics
  doi: 10.1162/0033553041502225
– ident: 9421_CR82
– ident: 9421_CR346
– ident: 9421_CR140
– ident: 9421_CR174
  doi: 10.1145/1015330.1015410
– ident: 9421_CR295
– ident: 9421_CR65
– volume: 16
  start-page: 185
  issue: 3
  year: 1994
  ident: 9421_CR329
  publication-title: Machine Learning
– volume: 13
  start-page: 33
  issue: 1
  year: 2000
  ident: 9421_CR132
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.678
– volume: 359
  start-page: 418
  issue: 6374
  year: 2018
  ident: 9421_CR50
  publication-title: Science
  doi: 10.1126/science.aao1733
– ident: 9421_CR278
– volume: 69
  start-page: 1193
  issue: 5
  year: 2001
  ident: 9421_CR72
  publication-title: Econometrica
  doi: 10.1111/1468-0262.00239
– ident: 9421_CR193
– ident: 9421_CR310
– ident: 9421_CR335
– ident: 9421_CR54
– ident: 9421_CR247
– volume: 8
  start-page: 323
  issue: 3–4
  year: 1992
  ident: 9421_CR296
  publication-title: Machine Learning
– ident: 9421_CR208
  doi: 10.1609/aaai.v33i01.33014504
– start-page: 1
  volume-title: Multiple Classifier Systems
  year: 2000
  ident: 9421_CR87
– ident: 9421_CR117
– ident: 9421_CR134
– ident: 9421_CR151
– ident: 9421_CR12
– ident: 9421_CR215
– ident: 9421_CR145
  doi: 10.1609/aaai.v32i1.11796
– ident: 9421_CR58
– ident: 9421_CR29
– ident: 9421_CR330
– ident: 9421_CR26
  doi: 10.1609/aaai.v27i1.8659
– ident: 9421_CR64
– ident: 9421_CR41
– ident: 9421_CR121
– ident: 9421_CR253
– volume: 131
  start-page: 415
  issue: 2
  year: 1994
  ident: 9421_CR299
  publication-title: Theoretical Computer Science
  doi: 10.1016/0304-3975(94)90181-3
– ident: 9421_CR93
– volume: 37
  start-page: 147
  issue: 1–2
  year: 1996
  ident: 9421_CR277
  publication-title: Biosystems
  doi: 10.1016/0303-2647(95)01551-5
– ident: 9421_CR309
– ident: 9421_CR70
– volume: 4
  start-page: 1039
  year: 2003
  ident: 9421_CR149
  publication-title: The Journal of Machine Learning Research
– ident: 9421_CR244
– volume: 67
  start-page: 23
  issue: 1–2
  year: 2006
  ident: 9421_CR71
  publication-title: Machine Learning
– volume: 82
  start-page: 281
  issue: 3
  year: 2011
  ident: 9421_CR73
  publication-title: Machine Learning
  doi: 10.1007/s10994-010-5192-9
– volume: 54
  start-page: 489
  issue: 3
  year: 2006
  ident: 9421_CR4
  publication-title: Operations Research
  doi: 10.1287/opre.1060.0291
– ident: 9421_CR282
– ident: 9421_CR178
– volume: 550
  start-page: 354
  issue: 7676
  year: 2017
  ident: 9421_CR293
  publication-title: Nature
  doi: 10.1038/nature24270
– ident: 9421_CR328
  doi: 10.1109/CIG.2018.8490422
– ident: 9421_CR365
– volume-title: Multiagent systems. Intelligent robotics and autonomous agents series
  year: 2013
  ident: 9421_CR350
– ident: 9421_CR276
– ident: 9421_CR126
– volume: 31
  start-page: 767
  issue: 4
  year: 2017
  ident: 9421_CR144
  publication-title: Autonomous Agents and Multi-Agent Systems
  doi: 10.1007/s10458-016-9352-6
– ident: 9421_CR24
– volume: 9
  start-page: 423
  issue: Mar
  year: 2008
  ident: 9421_CR250
  publication-title: JMLR
– volume: 38
  start-page: 58
  issue: 3
  year: 1995
  ident: 9421_CR325
  publication-title: Communications of the ACM
  doi: 10.1145/203330.203343
– ident: 9421_CR98
– ident: 9421_CR203
– volume: 64
  start-page: 645
  year: 2019
  ident: 9421_CR290
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.1.11396
– ident: 9421_CR199
– ident: 9421_CR18
– ident: 9421_CR249
  doi: 10.1145/1160633.1160776
– ident: 9421_CR336
  doi: 10.1609/aaai.v30i1.10295
– ident: 9421_CR137
  doi: 10.1609/aaai.v32i1.11694
– volume: 3
  start-page: 79
  issue: 1
  year: 1991
  ident: 9421_CR155
  publication-title: Neural Computation
  doi: 10.1162/neco.1991.3.1.79
– ident: 9421_CR133
– ident: 9421_CR265
– ident: 9421_CR288
– volume-title: A concise introduction to decentralized POMDPs
  year: 2016
  ident: 9421_CR237
  doi: 10.1007/978-3-319-28929-8
– ident: 9421_CR303
– volume: 38
  start-page: 316
  issue: 2–3
  year: 2019
  ident: 9421_CR35
  publication-title: The International Journal of Robotics Research
  doi: 10.1177/0278364918755924
– volume-title: Reinforcement learning: An introduction
  year: 2018
  ident: 9421_CR315
– ident: 9421_CR127
– ident: 9421_CR104
– ident: 9421_CR166
– ident: 9421_CR204
– ident: 9421_CR69
– ident: 9421_CR232
– volume: 61
  start-page: 85
  year: 2015
  ident: 9421_CR281
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2014.09.003
– ident: 9421_CR52
  doi: 10.1145/1150402.1150464
– ident: 9421_CR189
– ident: 9421_CR77
  doi: 10.1145/1160633.1160770
– ident: 9421_CR30
– ident: 9421_CR122
  doi: 10.1109/ISIC.1992.225046
– ident: 9421_CR13
– ident: 9421_CR177
  doi: 10.1109/ICNN.1997.616132
– ident: 9421_CR86
– volume: 17
  start-page: 335
  issue: 2
  year: 2005
  ident: 9421_CR227
  publication-title: Neural Computation
  doi: 10.1162/0899766053011528
– ident: 9421_CR331
– ident: 9421_CR161
– ident: 9421_CR361
  doi: 10.24963/ijcai.2018/820
– volume: 8
  start-page: 345
  issue: 3
  year: 2000
  ident: 9421_CR305
  publication-title: Autonomous Robots
  doi: 10.1023/A:1008942012299
– volume: 60
  start-page: 881
  year: 2017
  ident: 9421_CR340
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.5507
– ident: 9421_CR254
– ident: 9421_CR9
– volume: 356
  start-page: 508
  issue: 6337
  year: 2017
  ident: 9421_CR224
  publication-title: Science
  doi: 10.1126/science.aam6960
– ident: 9421_CR75
– ident: 9421_CR314
– ident: 9421_CR8
  doi: 10.1609/aaai.v29i1.9439
– volume: 28
  start-page: 182
  issue: 2
  year: 2013
  ident: 9421_CR66
  publication-title: Autonomous Agents and Multi-Agent Systems
  doi: 10.1007/s10458-013-9222-4
– ident: 9421_CR308
– ident: 9421_CR342
– volume: 33
  start-page: 235
  issue: 2–3
  year: 1998
  ident: 9421_CR74
  publication-title: Machine Learning
  doi: 10.1023/A:1007518724497
– ident: 9421_CR260
– ident: 9421_CR138
– ident: 9421_CR209
– ident: 9421_CR172
– ident: 9421_CR169
  doi: 10.1109/ICRA.2015.7139357
– ident: 9421_CR356
  doi: 10.1145/301136.301167
– ident: 9421_CR240
– ident: 9421_CR183
– start-page: 209
  volume-title: Advances in neural information processing systems
  year: 2004
  ident: 9421_CR42
– ident: 9421_CR343
– ident: 9421_CR234
– ident: 9421_CR366
– ident: 9421_CR148
– volume: 364
  start-page: 859
  issue: 6443
  year: 2019
  ident: 9421_CR156
  publication-title: Science
  doi: 10.1126/science.aau6249
– ident: 9421_CR22
– ident: 9421_CR124
  doi: 10.1007/978-3-319-71682-4_5
– ident: 9421_CR125
– ident: 9421_CR160
– ident: 9421_CR257
– ident: 9421_CR292
– ident: 9421_CR97
– volume: 27
  start-page: 819
  issue: 4
  year: 2002
  ident: 9421_CR34
  publication-title: Mathematics of Operations Research
  doi: 10.1287/moor.27.4.819.297
– ident: 9421_CR320
– volume: 8
  start-page: 229
  issue: 3–4
  year: 1992
  ident: 9421_CR354
  publication-title: Machine Learning
– ident: 9421_CR131
– volume: 11
  start-page: 241
  year: 1999
  ident: 9421_CR226
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.613
– ident: 9421_CR263
– ident: 9421_CR286
– start-page: 66
  volume-title: Autonomous agents and multiagent systems
  year: 2017
  ident: 9421_CR123
  doi: 10.1007/978-3-319-71682-4_5
– ident: 9421_CR359
  doi: 10.24963/ijcai.2018/79
– ident: 9421_CR348
– ident: 9421_CR11
– ident: 9421_CR63
– volume: 9
  start-page: 2579
  issue: Nov
  year: 2008
  ident: 9421_CR210
  publication-title: Journal of Machine Learning Research
– volume: 61
  start-page: 1019
  year: 1993
  ident: 9421_CR167
  publication-title: Econometrica: Journal of the Econometric Society
  doi: 10.2307/2951492
– ident: 9421_CR188
– ident: 9421_CR364
  doi: 10.1145/3219819.3219918
– ident: 9421_CR53
  doi: 10.1007/BFb0040758
– ident: 9421_CR297
– ident: 9421_CR80
– ident: 9421_CR225
  doi: 10.1609/aaai.v32i1.11492
– volume-title: Evolutionary computation: A unified approach
  year: 2006
  ident: 9421_CR85
– volume-title: Pattern recognition and machine learning
  year: 2006
  ident: 9421_CR36
– ident: 9421_CR228
– volume: 36
  start-page: 48
  issue: 1
  year: 1950
  ident: 9421_CR229
  publication-title: Proceedings of the National Academy of Sciences
  doi: 10.1073/pnas.36.1.48
– ident: 9421_CR355
  doi: 10.1142/9789812777263_0020
– ident: 9421_CR252
– ident: 9421_CR332
  doi: 10.1145/1329125.1329434
– volume: 4
  start-page: 1
  issue: 1
  year: 2012
  ident: 9421_CR51
  publication-title: IEEE Transactions on Computational Intelligence and AI in Games
  doi: 10.1109/TCIAIG.2012.2186810
– volume: 14
  start-page: 298
  issue: 3
  year: 1973
  ident: 9421_CR91
  publication-title: Journal of Combinatorial Theory, Series A
  doi: 10.1016/0097-3165(73)90005-8
– volume: 24
  start-page: 49
  issue: 1
  year: 2005
  ident: 9421_CR109
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.1579
– ident: 9421_CR136
– ident: 9421_CR10
– ident: 9421_CR245
– ident: 9421_CR119
– ident: 9421_CR23
  doi: 10.1145/860575.860686
– ident: 9421_CR108
– ident: 9421_CR349
– ident: 9421_CR79
– volume: 11
  start-page: 2017
  issue: 8
  year: 1999
  ident: 9421_CR319
  publication-title: Neural Computation
  doi: 10.1162/089976699300016070
– ident: 9421_CR326
– volume: 17
  start-page: 213
  issue: 2
  year: 2013
  ident: 9421_CR275
  publication-title: IEEE Transactions on Evolutionary Computation
  doi: 10.1109/TEVC.2012.2208755
– ident: 9421_CR298
– volume: 45
  start-page: 2673
  issue: 11
  year: 1997
  ident: 9421_CR285
  publication-title: IEEE Transactions on Signal Processing
  doi: 10.1109/78.650093
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  ident: 9421_CR291
  publication-title: Nature
  doi: 10.1038/nature16961
– ident: 9421_CR143
– ident: 9421_CR171
– ident: 9421_CR313
– ident: 9421_CR46
– ident: 9421_CR360
– ident: 9421_CR17
– volume: 47
  start-page: 253
  year: 2013
  ident: 9421_CR32
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.3912
– volume: 28
  start-page: 41
  issue: 1
  year: 1997
  ident: 9421_CR62
  publication-title: Machine Learning
  doi: 10.1023/A:1007379606734
– ident: 9421_CR269
– ident: 9421_CR216
– volume: 10
  start-page: 174
  issue: 1
  year: 1965
  ident: 9421_CR14
  publication-title: Journal of Mathematical Analysis and Applications
  doi: 10.1016/0022-247X(65)90154-X
– volume: 11
  start-page: 219
  issue: 3–4
  year: 2018
  ident: 9421_CR101
  publication-title: Foundations and Trends® in Machine Learning
  doi: 10.1561/2200000071
– ident: 9421_CR159
– ident: 9421_CR57
– ident: 9421_CR311
  doi: 10.1007/3-540-52255-7_33
– volume: 32
  start-page: 289
  year: 2008
  ident: 9421_CR239
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.2447
– ident: 9421_CR165
– ident: 9421_CR40
– ident: 9421_CR60
  doi: 10.1057/9780230523371_8
– ident: 9421_CR150
  doi: 10.1007/3-540-61723-X_967
– ident: 9421_CR258
– ident: 9421_CR205
– volume: 211
  start-page: 1390
  issue: 27
  year: 1981
  ident: 9421_CR15
  publication-title: Science
  doi: 10.1126/science.7466396
– ident: 9421_CR5
– volume: 5
  start-page: 1
  issue: 1
  year: 1997
  ident: 9421_CR271
  publication-title: Evolutionary Computation
  doi: 10.1162/evco.1997.5.1.1
– ident: 9421_CR96
– start-page: 441
  volume-title: Reinforcement learning
  year: 2012
  ident: 9421_CR233
  doi: 10.1007/978-3-642-27645-3_14
– ident: 9421_CR264
– ident: 9421_CR238
  doi: 10.1145/1143997.1144059
– start-page: 330
  volume-title: Machine Learning Proceedings 1993
  year: 1993
  ident: 9421_CR323
– ident: 9421_CR158
– volume: 258
  start-page: 66
  year: 2018
  ident: 9421_CR6
  publication-title: Artificial Intelligence
  doi: 10.1016/j.artint.2018.01.002
– volume: 4
  start-page: 237
  year: 1996
  ident: 9421_CR164
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.301
– ident: 9421_CR187
– volume: 17
  start-page: 320
  issue: 2
  year: 2008
  ident: 9421_CR3
  publication-title: Autonomous Agents and Multi-Agent Systems
  doi: 10.1007/s10458-008-9046-9
– ident: 9421_CR106
– ident: 9421_CR141
– ident: 9421_CR273
– ident: 9421_CR236
  doi: 10.24963/ijcai.2018/813
– volume: 22
  start-page: 423
  year: 2004
  ident: 9421_CR28
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.1497
– volume: 61
  start-page: 56
  issue: 10
  year: 2018
  ident: 9421_CR81
  publication-title: Communications of the ACM
  doi: 10.1145/3271625
– ident: 9421_CR230
– ident: 9421_CR362
– ident: 9421_CR196
– ident: 9421_CR206
– volume: 7
  start-page: 83
  year: 1997
  ident: 9421_CR321
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.433
– ident: 9421_CR338
– volume: 11
  start-page: 387
  issue: 3
  year: 2005
  ident: 9421_CR248
  publication-title: Autonomous Agents and Multi-Agent Systems
  doi: 10.1007/s10458-005-2631-2
– ident: 9421_CR2
– ident: 9421_CR78
– volume: 13
  start-page: 374
  issue: 1
  year: 1951
  ident: 9421_CR49
  publication-title: Activity Analysis of Production and Allocation
– ident: 9421_CR112
– ident: 9421_CR304
  doi: 10.1609/aaai.v24i1.7529
– volume: 199–200
  start-page: 67
  issue: C
  year: 2013
  ident: 9421_CR76
  publication-title: Artificial Intelligence
  doi: 10.1016/j.artint.2013.05.004
– ident: 9421_CR135
– ident: 9421_CR153
  doi: 10.1609/aaai.v32i1.11595
– ident: 9421_CR267
– ident: 9421_CR241
– ident: 9421_CR84
– volume: 1
  start-page: 120
  year: 1996
  ident: 9421_CR61
  publication-title: AAAI/IAAI
– volume: 14
  start-page: 159
  issue: 3
  year: 1967
  ident: 9421_CR129
  publication-title: Management Science
  doi: 10.1287/mnsc.14.3.159
– ident: 9421_CR218
– volume-title: The rating of chessplayers, past and present
  year: 1978
  ident: 9421_CR90
– volume: 13
  start-page: 103
  issue: 1
  year: 1993
  ident: 9421_CR223
  publication-title: Machine Learning
– ident: 9421_CR344
– ident: 9421_CR235
– ident: 9421_CR67
– volume: 104
  start-page: 99
  issue: 1
  year: 2016
  ident: 9421_CR272
  publication-title: Machine Learning
  doi: 10.1007/s10994-016-5547-y
– ident: 9421_CR107
  doi: 10.1609/aiide.v15i1.5220
– volume: 3
  start-page: 319
  issue: 4
  year: 2000
  ident: 9421_CR110
  publication-title: Autonomous Agents and Multi-Agent Systems
  doi: 10.1023/A:1010028119149
– ident: 9421_CR256
– ident: 9421_CR68
  doi: 10.1609/aaai.v31i1.10810
– volume: 9
  start-page: 1735
  issue: 8
  year: 1997
  ident: 9421_CR147
  publication-title: Neural Computation
  doi: 10.1162/neco.1997.9.8.1735
– ident: 9421_CR170
– ident: 9421_CR312
– ident: 9421_CR146
– volume: 136
  start-page: 215
  issue: 2
  year: 2002
  ident: 9421_CR45
  publication-title: Artificial Intelligence
  doi: 10.1016/S0004-3702(02)00121-2
– volume: 68
  start-page: 258
  issue: 1
  year: 1996
  ident: 9421_CR222
  publication-title: Journal of Economic Theory
  doi: 10.1006/jeth.1996.0014
– ident: 9421_CR306
– ident: 9421_CR191
– volume: 12
  start-page: e0172395
  issue: 4
  year: 2017
  ident: 9421_CR322
  publication-title: PLoS ONE
  doi: 10.1371/journal.pone.0172395
– volume-title: Algorithmic game theory
  year: 2007
  ident: 9421_CR39
– ident: 9421_CR262
– ident: 9421_CR142
  doi: 10.1609/aiide.v15i1.5221
– volume: 38
  start-page: 287
  issue: 3
  year: 2000
  ident: 9421_CR294
  publication-title: Machine Learning
  doi: 10.1023/A:1007678930559
– ident: 9421_CR181
– ident: 9421_CR363
  doi: 10.1609/aaai.v24i1.7639
– ident: 9421_CR345
– ident: 9421_CR353
  doi: 10.1007/978-3-642-27645-3
– ident: 9421_CR89
– start-page: 17
  volume-title: Intrinsically motivated learning in natural and artificial systems
  year: 2013
  ident: 9421_CR27
  doi: 10.1007/978-3-642-32375-1_2
– ident: 9421_CR352
  doi: 10.1109/ADPRL.2011.5967363
– ident: 9421_CR251
– ident: 9421_CR190
  doi: 10.1609/aaai.v33i01.33014213
– volume: 171
  start-page: 365
  issue: 7
  year: 2007
  ident: 9421_CR289
  publication-title: Artificial Intelligence
  doi: 10.1016/j.artint.2006.02.006
– ident: 9421_CR176
  doi: 10.1145/1143844.1143906
– volume: 27
  start-page: 1
  issue: 1
  year: 2012
  ident: 9421_CR213
  publication-title: Knowledge Engineering Review
  doi: 10.1017/S0269888912000057
– ident: 9421_CR95
– ident: 9421_CR317
– volume: 16
  start-page: 1
  issue: 03
  year: 2002
  ident: 9421_CR7
  publication-title: Knowledge Engineering Review
– ident: 9421_CR152
– ident: 9421_CR246
– ident: 9421_CR175
– volume: 17
  start-page: 1
  year: 2016
  ident: 9421_CR347
  publication-title: Journal of Machine Learning Research
– ident: 9421_CR280
– ident: 9421_CR118
– ident: 9421_CR198
  doi: 10.1016/B978-1-55860-335-6.50027-1
– ident: 9421_CR300
– volume: 8
  start-page: 293
  issue: 3–4
  year: 1992
  ident: 9421_CR194
  publication-title: Machine Learning
– ident: 9421_CR186
– ident: 9421_CR1
– volume: 10
  start-page: 1633
  year: 2009
  ident: 9421_CR324
  publication-title: The Journal of Machine Learning Research
– ident: 9421_CR367
– volume: 61
  start-page: 523
  year: 2018
  ident: 9421_CR211
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.5699
– volume: 40
  start-page: 1
  year: 2016
  ident: 9421_CR179
  publication-title: Behavioral and Brain Sciences
– ident: 9421_CR268
  doi: 10.1007/11564096_32
– ident: 9421_CR173
  doi: 10.1609/aaai.v33i01.33016079
– ident: 9421_CR44
– ident: 9421_CR279
– volume: 347
  start-page: 145
  issue: 6218
  year: 2015
  ident: 9421_CR43
  publication-title: Science
  doi: 10.1126/science.1259433
– volume: 86
  start-page: 269
  issue: 2
  year: 1996
  ident: 9421_CR115
  publication-title: Artificial Intelligence
  doi: 10.1016/0004-3702(95)00103-4
– ident: 9421_CR274
– ident: 9421_CR197
– ident: 9421_CR339
– ident: 9421_CR102
  doi: 10.1145/1390156.1390199
– ident: 9421_CR201
– volume: 38
  start-page: 156
  issue: 2
  year: 2008
  ident: 9421_CR55
  publication-title: IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews)
  doi: 10.1109/TSMCC.2007.913919
– ident: 9421_CR113
– ident: 9421_CR130
– ident: 9421_CR16
SSID ssj0016261
Score 2.6821315
Snippet Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 750
SubjectTerms Artificial Intelligence
Computer Science
Computer Systems Organization and Communication Networks
Domains
Machine learning
Multiagent systems
New Horizons in Multiagent Learning
Software Engineering/Programming and Operating Systems
User Interfaces and Human Computer Interaction
Title A survey and critique of multiagent deep reinforcement learning
URI https://link.springer.com/article/10.1007/s10458-019-09421-1
https://www.proquest.com/docview/2307168851
Volume 33
WOSCitedRecordID wos000491059500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-7454
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0016261
  issn: 1387-2532
  databaseCode: RSV
  dateStart: 19980301
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA8yffDF-YnTKXnwTQNNm7bpkwxx-DTEL_ZW8lURpBvtNvC_95Kmm4oK-twkhLtc7nfN3e8QOgOPlCrGDEkSEREWF5xInmREplTwlKsg1Nw1m0hHIz4eZ7e-KKxus93bJ0l3U38odmOxTbyyKT4spARinnVwd9ya49390_LtACB6E2Zxy0EQhb5U5vs1PrujFcb88izqvM2w-799bqMtjy7xoDkOO2jNlLuo23ZuwN6Q99DlANfzamHesCg1VpbYCHaEJwV2CYbC1lthbcwUV8ZRqyr3FxH7HhPP--hxeP1wdUN8KwWiwMZmJEpATkIYUwCYNopmTOmgoCYNg1RD0MUylUglAbsBPmKZlKkFB0pJGgsKao4OUKeclOYQYRpSI2IDkU-kmaRaJDrWtIgBCQW2jUcP0VaiufI847bdxWu-Yki2EspBQrmTUA5zzpdzpg3Lxq-j-62icm9xdW4T2mnCAUD20EWrmNXnn1c7-tvwY7QZWt26csQ-6syquTlBG2oxe6mrU3cS3wFX5NUm
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA-igl6cnzidmoM3DTRt2qYnGeKYOIfolN1CviqCdGPdBv73Jlm6qaigpx6ahPBeXvN7fR8_AE7NjZRKQjRKEh4hEucUCZpkSKSY05TKIFTUkU2k3S7t97M7XxRWVtnuVUjSfak_FLuR2CZe2RQfEmJkfJ4V84xtIt_9w9M8dmAg-szNorYHQRT6Upnv1_h8HS0w5pewqLttWrX_7XMTbHh0CZuz47AFlnSxDWoVcwP0hrwDLpqwnIym-g3yQkFpGxuZHcFBDl2CIbf1VlBpPYQj7VqrSvcXEXqOiedd8Ni66l22kadSQNLY2BhFSUYx51rnBkxriTMiVZBjnYZBqozTRTKZCCkMdjP4iGRCpBYcSClwzLFRc7QHlotBofcBxCHWPNbG84kUEVjxRMUK57FBQoGl8agDXEmUSd9n3NJdvLJFh2QrIWYkxJyEmJlzNp8znHXZ-HV0o1IU8xZXMpvQjhNqAGQdnFeKWbz-ebWDvw0_AWvt3m2Hda67N4dgPbR6dqWJDbA8Hk30EViV0_FLOTp2p_IdXsfYCg
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bS8MwFA6iIr44rzidmgffNKxp0zZ9kqEORRkDL-wt5FYRpBtdN_Dfm6TtNkUF8blJCOfkkO8053wfAKfmRoolIRpFEQ8QCVOKBI0SJGLMaUyl5yvqxCbiXo8OBkl_oYvfVbvXT5JlT4NlacqK9kil7YXGNxLaIixb7kN8jEz-s0KsaJDN1x-eZ-8IBq6XKRe1fASBX7XNfL_G56tpjje_PJG6m6fb-P-eN8FGhTphpzwmW2BJZ9ugUSs6wCrAd8BFB44n-VS_Q54pKC3hkdkdHKbQFR5y24cFldYjmGtHuSrd30VYaU-87IKn7vXj5Q2qJBaQNLFXoCBKKOZc69SAbC1xQqTyUqxj34uVScZIIiMhhcF0BjeRRIjYggYpBQ45Nu4P9sByNsz0PoDYx5qH2mREgSICKx6pUOE0NAjJs_IeTYBr6zJZ8Y9bGYw3NmdOthZixkLMWYiZOWezOaOSfePX0a3aaayKxDGzhe44ogZYNsF57aT5559XO_jb8BOw1r_qsvvb3t0hWPetm13HYgssF_lEH4FVOS1ex_mxO6AfKC_g7g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+survey+and+critique+of+multiagent+deep+reinforcement+learning&rft.jtitle=Autonomous+agents+and+multi-agent+systems&rft.au=Hernandez-Leal%2C+Pablo&rft.au=Kartal%2C+Bilal&rft.au=Taylor%2C+Matthew+E.&rft.date=2019-11-01&rft.issn=1387-2532&rft.eissn=1573-7454&rft.volume=33&rft.issue=6&rft.spage=750&rft.epage=797&rft_id=info:doi/10.1007%2Fs10458-019-09421-1&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s10458_019_09421_1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1387-2532&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1387-2532&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1387-2532&client=summon