A survey and critique of multiagent deep reinforcement learning
Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results...
Gespeichert in:
| Veröffentlicht in: | Autonomous agents and multi-agent systems Jg. 33; H. 6; S. 750 - 797 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Springer US
01.11.2019
Springer Nature B.V |
| Schlagworte: | |
| ISSN: | 1387-2532, 1573-7454 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community. |
|---|---|
| AbstractList | Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community. |
| Author | Hernandez-Leal, Pablo Taylor, Matthew E. Kartal, Bilal |
| Author_xml | – sequence: 1 givenname: Pablo orcidid: 0000-0002-8530-6775 surname: Hernandez-Leal fullname: Hernandez-Leal, Pablo email: pablo.hernandez@borealisai.com organization: Borealis AI – sequence: 2 givenname: Bilal surname: Kartal fullname: Kartal, Bilal organization: Borealis AI – sequence: 3 givenname: Matthew E. surname: Taylor fullname: Taylor, Matthew E. organization: Borealis AI |
| BookMark | eNp9kE1LxDAQhoOs4O7qH_BU8FzNJGmTnmRZ_IIFL3oOaTpdsnTTNUmF_fd2rSB48DTD8D4zw7MgM997JOQa6C1QKu8iUFGonEKV00owyOGMzKGQPJeiELOx50rmrODsgixi3FEKJSthTu5XWRzCJx4z45vMBpfcx4BZ32b7oUvObNGnrEE8ZAGdb_tgcX8adWiCd357Sc5b00W8-qlL8v748LZ-zjevTy_r1Sa3HKqU87JSYAxiK1mFFiphG9oCSkZlo5gSlS1rWxeyKIUQVV1LSqmytobCQI2WL8nNtPcQ-vHBmPSuH4IfT2rGqYRSqQLGlJpSNvQxBmy1dckk1_sUjOs0UH3SpSddetSlv3XpE8r-oIfg9iYc_4f4BMUx7LcYfr_6h_oCyr9-rg |
| CitedBy_id | crossref_primary_10_1002_rnc_70029 crossref_primary_10_1016_j_biosystems_2023_105107 crossref_primary_10_1109_TVT_2023_3338612 crossref_primary_10_3389_fcomp_2022_846440 crossref_primary_10_1016_j_sysconle_2023_105563 crossref_primary_10_1109_ACCESS_2025_3569093 crossref_primary_10_1109_TSP_2023_3334396 crossref_primary_10_3390_jmse13010020 crossref_primary_10_1109_ACCESS_2023_3340867 crossref_primary_10_1007_s42979_022_01453_x crossref_primary_10_1109_TCYB_2023_3266448 crossref_primary_10_1109_TNSE_2022_3188670 crossref_primary_10_1007_s10458_021_09514_w crossref_primary_10_1109_LRA_2020_2969937 crossref_primary_10_1109_JAS_2022_105506 crossref_primary_10_3390_electronics9091363 crossref_primary_10_1016_j_neucom_2022_01_025 crossref_primary_10_1109_TCCN_2023_3293018 crossref_primary_10_1186_s13677_023_00446_2 crossref_primary_10_1007_s10489_022_03821_9 crossref_primary_10_1109_TCOMM_2024_3365520 crossref_primary_10_1007_s10462_022_10299_x crossref_primary_10_1109_TNNLS_2022_3147221 crossref_primary_10_1016_j_neunet_2023_01_046 crossref_primary_10_1007_s10458_021_09541_7 crossref_primary_10_1016_j_eswa_2020_113701 crossref_primary_10_1080_10447318_2022_2083463 crossref_primary_10_1109_TNNLS_2024_3382985 crossref_primary_10_1007_s10458_023_09633_6 crossref_primary_10_1109_TASE_2025_3563725 crossref_primary_10_1016_j_sigpro_2023_108965 crossref_primary_10_1007_s11063_024_11611_2 crossref_primary_10_1063_5_0147231 crossref_primary_10_1016_j_jai_2024_02_003 crossref_primary_10_1016_j_ijepes_2024_109863 crossref_primary_10_1007_s10462_023_10450_2 crossref_primary_10_1007_s10668_021_01836_9 crossref_primary_10_1109_ACCESS_2019_2963584 crossref_primary_10_3389_frai_2022_805823 crossref_primary_10_1371_journal_pone_0311550 crossref_primary_10_1038_s43017_023_00409_w crossref_primary_10_1109_TNNLS_2024_3455422 crossref_primary_10_1016_j_watres_2024_121145 crossref_primary_10_1016_j_engappai_2020_104112 crossref_primary_10_1080_17480930_2024_2362579 crossref_primary_10_1017_pds_2021_17 crossref_primary_10_1007_s00521_021_06117_0 crossref_primary_10_1016_j_chaos_2025_117004 crossref_primary_10_1016_j_trc_2023_104033 crossref_primary_10_1016_j_isatra_2025_08_003 crossref_primary_10_1109_TNNLS_2022_3219814 crossref_primary_10_3390_app14041677 crossref_primary_10_1016_j_ress_2024_110118 crossref_primary_10_1108_IJMPB_03_2024_0065 crossref_primary_10_1016_j_procs_2022_09_426 crossref_primary_10_1109_TITS_2024_3411487 crossref_primary_10_1137_23M1592559 crossref_primary_10_1016_j_comcom_2023_07_006 crossref_primary_10_1177_03611981251333710 crossref_primary_10_3390_app12146953 crossref_primary_10_1109_TCNS_2021_3097306 crossref_primary_10_1007_s00607_025_01472_5 crossref_primary_10_1016_j_compchemeng_2025_109111 crossref_primary_10_3233_AIC_220128 crossref_primary_10_3390_en18123171 crossref_primary_10_1007_s13253_023_00551_4 crossref_primary_10_3390_e23111433 crossref_primary_10_1109_TITS_2025_3530463 crossref_primary_10_1016_j_rcim_2022_102324 crossref_primary_10_1016_j_apenergy_2023_122349 crossref_primary_10_1016_j_neunet_2022_05_013 crossref_primary_10_1016_j_neucom_2024_128068 crossref_primary_10_1162_artl_a_00416 crossref_primary_10_3390_wevj15100453 crossref_primary_10_1109_TCC_2021_3110965 crossref_primary_10_1007_s43154_022_00091_8 crossref_primary_10_2139_ssrn_5333312 crossref_primary_10_1109_TITS_2020_3024655 crossref_primary_10_1016_j_jmsy_2022_08_004 crossref_primary_10_1109_LRA_2025_3562371 crossref_primary_10_3390_drones8080368 crossref_primary_10_3390_math10152728 crossref_primary_10_1109_JIOT_2023_3308260 crossref_primary_10_1109_JIOT_2020_2968951 crossref_primary_10_1109_TASE_2025_3528501 crossref_primary_10_1016_j_neucom_2024_128514 crossref_primary_10_3390_app14199048 crossref_primary_10_1016_j_ress_2021_107551 crossref_primary_10_1109_ACCESS_2024_3383442 crossref_primary_10_1007_s43684_022_00045_z crossref_primary_10_1007_s10994_022_06286_6 crossref_primary_10_1109_TNNLS_2022_3146976 crossref_primary_10_1016_j_inffus_2025_103629 crossref_primary_10_1016_j_adhoc_2025_103838 crossref_primary_10_1109_TNNLS_2024_3385097 crossref_primary_10_1109_TII_2020_3032165 crossref_primary_10_1109_TITS_2024_3407760 crossref_primary_10_1007_s10458_025_09691_y crossref_primary_10_1109_ACCESS_2020_3011670 crossref_primary_10_1109_TNNLS_2022_3165114 crossref_primary_10_1007_s10846_023_01917_z crossref_primary_10_1109_ACCESS_2021_3110255 crossref_primary_10_1109_TASE_2025_3563489 crossref_primary_10_3389_frobt_2024_1229026 crossref_primary_10_1007_s10458_019_09433_x crossref_primary_10_12677_sea_2024_135072 crossref_primary_10_3390_s23031509 crossref_primary_10_1016_j_phycom_2022_101766 crossref_primary_10_1016_j_engappai_2024_108012 crossref_primary_10_1007_s10489_022_04105_y crossref_primary_10_1109_JAS_2021_1003814 crossref_primary_10_1109_ACCESS_2025_3609457 crossref_primary_10_1109_TGCN_2024_3495236 crossref_primary_10_1109_ACCESS_2021_3087410 crossref_primary_10_1007_s10994_023_06365_2 crossref_primary_10_1111_mice_12702 crossref_primary_10_1016_j_ins_2021_11_054 crossref_primary_10_1109_COMST_2022_3200740 crossref_primary_10_1111_mice_13234 crossref_primary_10_1007_s10462_021_09996_w crossref_primary_10_1016_j_future_2022_06_015 crossref_primary_10_1016_j_neucom_2023_126974 crossref_primary_10_1093_comjnl_bxaf076 crossref_primary_10_1080_08839514_2022_2033473 crossref_primary_10_3390_s21237829 crossref_primary_10_1016_j_engappai_2025_110978 crossref_primary_10_1145_3643862 crossref_primary_10_1007_s11768_020_00007_x crossref_primary_10_1109_COMST_2021_3063822 crossref_primary_10_3390_aerospace9100563 crossref_primary_10_1016_j_advengsoft_2023_103487 crossref_primary_10_1016_j_physrep_2021_10_005 crossref_primary_10_3390_app15052580 crossref_primary_10_1016_j_apenergy_2022_120113 crossref_primary_10_1007_s13177_025_00521_9 crossref_primary_10_1109_ACCESS_2020_3005734 crossref_primary_10_1109_TNSM_2020_3047765 crossref_primary_10_1016_j_arcontrol_2022_03_003 crossref_primary_10_1007_s10489_022_04225_5 crossref_primary_10_1145_3699431 crossref_primary_10_1109_TETCI_2024_3360282 crossref_primary_10_1109_ACCESS_2022_3227450 crossref_primary_10_3390_en18071724 crossref_primary_10_1109_TR_2022_3158279 crossref_primary_10_1016_j_ijepes_2023_109641 crossref_primary_10_1051_bioconf_202411604005 crossref_primary_10_1109_TNNLS_2023_3264540 crossref_primary_10_1109_TCCN_2021_3063170 crossref_primary_10_1177_14727978251348624 crossref_primary_10_1109_TNNLS_2021_3071959 crossref_primary_10_1109_ACCESS_2020_2964042 crossref_primary_10_1007_s13042_023_01976_6 crossref_primary_10_1016_j_ijepes_2022_108848 crossref_primary_10_1073_pnas_2319925121 crossref_primary_10_1109_TC_2025_3587976 crossref_primary_10_3390_e23040461 crossref_primary_10_1016_j_apenergy_2024_123625 crossref_primary_10_1016_j_cogsys_2025_101338 crossref_primary_10_1016_j_jksuci_2023_101836 crossref_primary_10_1016_j_engappai_2025_110352 crossref_primary_10_1109_TCSS_2024_3428334 crossref_primary_10_1016_j_rser_2024_114282 crossref_primary_10_1109_LCOMM_2024_3369761 crossref_primary_10_1016_j_simpat_2025_103100 crossref_primary_10_3389_fpsyt_2025_1562061 crossref_primary_10_1007_s40747_023_01145_w crossref_primary_10_1109_LRA_2022_3224667 crossref_primary_10_3390_act10100268 crossref_primary_10_1088_1742_6596_2449_1_012031 crossref_primary_10_1017_dap_2024_86 crossref_primary_10_1109_ACCESS_2024_3486346 crossref_primary_10_1007_s10489_022_03605_1 crossref_primary_10_1109_LRA_2020_3010203 crossref_primary_10_3390_app12178641 crossref_primary_10_3390_math11102234 crossref_primary_10_1109_ACCESS_2025_3580279 crossref_primary_10_3389_frobt_2024_1394209 crossref_primary_10_1016_j_knosys_2022_109916 crossref_primary_10_1016_j_comcom_2023_01_009 crossref_primary_10_1088_1757_899X_1292_1_012019 crossref_primary_10_1007_s11424_025_4426_7 crossref_primary_10_1016_j_neunet_2025_107192 crossref_primary_10_3390_smartcities5010019 crossref_primary_10_1109_TMI_2020_3048477 crossref_primary_10_1109_LRA_2025_3555940 crossref_primary_10_1073_pnas_2319948121 crossref_primary_10_1016_j_chaos_2025_116550 crossref_primary_10_1007_s13042_023_02063_6 crossref_primary_10_1109_TCYB_2025_3575419 crossref_primary_10_3390_app15126939 crossref_primary_10_1155_2022_4830491 crossref_primary_10_1109_ACCESS_2021_3053348 crossref_primary_10_1162_artl_a_00408 crossref_primary_10_3390_drones7090589 crossref_primary_10_1109_TG_2022_3232390 crossref_primary_10_1007_s12530_024_09587_4 crossref_primary_10_1016_j_swevo_2025_102132 crossref_primary_10_1109_TVT_2023_3344934 crossref_primary_10_1109_TPDS_2020_3046737 crossref_primary_10_1145_3431843_3431847 crossref_primary_10_1007_s40747_024_01415_1 crossref_primary_10_1080_0952813X_2024_2361408 crossref_primary_10_1109_TCCN_2021_3080677 crossref_primary_10_1088_1742_6596_2646_1_012021 crossref_primary_10_1038_s41598_023_28627_8 crossref_primary_10_1007_s00521_024_10524_4 crossref_primary_10_1007_s13202_025_02014_7 crossref_primary_10_1016_j_sysarc_2022_102551 crossref_primary_10_1051_e3sconf_202127001036 crossref_primary_10_1007_s10462_022_10224_2 crossref_primary_10_1016_j_neucom_2021_10_093 crossref_primary_10_1177_15741702251370050 crossref_primary_10_1016_j_eswa_2023_121111 crossref_primary_10_3390_drones9070484 crossref_primary_10_1109_TITS_2023_3276416 crossref_primary_10_1007_s12204_025_2814_8 crossref_primary_10_1007_s12652_020_02198_2 crossref_primary_10_1016_j_epsr_2025_111720 crossref_primary_10_1145_3759919 crossref_primary_10_1162_artl_a_00384 crossref_primary_10_1109_JIOT_2023_3288050 crossref_primary_10_3390_app15020836 crossref_primary_10_1109_JAS_2023_123705 crossref_primary_10_1145_3603703 crossref_primary_10_1007_s11704_024_3797_6 crossref_primary_10_1016_j_scico_2024_103176 crossref_primary_10_1109_TII_2022_3169457 crossref_primary_10_1016_j_inffus_2024_102318 crossref_primary_10_1177_01423312221077755 crossref_primary_10_1109_TG_2024_3485726 crossref_primary_10_1109_JIOT_2025_3585025 crossref_primary_10_1109_TKDE_2020_3014246 crossref_primary_10_1109_TVT_2022_3219428 crossref_primary_10_1109_JIOT_2022_3226953 crossref_primary_10_1109_ACCESS_2021_3091605 crossref_primary_10_1007_s00521_022_07960_5 crossref_primary_10_3390_app12020610 crossref_primary_10_1109_TPEL_2020_2971637 crossref_primary_10_3390_s22134732 crossref_primary_10_1109_ACCESS_2021_3082259 crossref_primary_10_1016_j_aei_2025_103878 crossref_primary_10_14201_ADCAIJ2019843340 crossref_primary_10_3390_fi14010017 crossref_primary_10_1109_TIV_2024_3408257 crossref_primary_10_1007_s43684_025_00090_4 crossref_primary_10_1287_moor_2022_1274 crossref_primary_10_1109_LWC_2023_3316794 crossref_primary_10_1016_j_engappai_2025_110446 crossref_primary_10_3390_s22051746 crossref_primary_10_1016_j_apenergy_2022_119067 crossref_primary_10_1016_j_comcom_2023_04_006 crossref_primary_10_1007_s40745_025_00641_9 crossref_primary_10_1016_j_eswa_2024_124117 crossref_primary_10_3390_app13137486 crossref_primary_10_1109_TNSE_2021_3136942 crossref_primary_10_3390_electronics11244204 crossref_primary_10_1109_JIOT_2022_3194726 crossref_primary_10_1109_TVT_2023_3326877 crossref_primary_10_1109_TNSM_2022_3205900 crossref_primary_10_23919_JSEE_2022_000119 crossref_primary_10_1016_j_engappai_2022_105329 crossref_primary_10_1016_j_robot_2022_104307 crossref_primary_10_1016_j_rser_2020_110618 crossref_primary_10_3390_fi17090404 crossref_primary_10_1016_j_eswa_2025_129421 |
| Cites_doi | 10.1109/TNNLS.2016.2582924 10.1016/0004-3702(92)90058-6 10.1609/aaai.v32i1.11606 10.1145/1390156.1390240 10.1109/ADPRL.2009.4927542 10.1109/TAC.2005.843878 10.1037/0033-2909.86.3.638 10.1613/jair.4818 10.1038/s41598-019-45619-9 10.1016/j.jcss.2007.08.009 10.1016/S1389-0417(01)00015-8 10.1002/9780470316887 10.1109/IROS.2012.6386109 10.3233/KES-2010-0206 10.1007/978-3-642-14435-6_7 10.1613/jair.1000 10.1007/s10994-006-9643-2 10.1038/nature14236 10.1007/s10994-011-5235-x 10.2200/S00268ED1V01Y201005AIM009 10.1038/nature14539 10.24963/ijcai.2019/88 10.24963/ijcai.2018/55 10.1093/oso/9780195099713.001.0001 10.1162/neco.1994.6.6.1185 10.1609/aimag.v33i3.2426 10.1162/0033553041502225 10.1145/1015330.1015410 10.1613/jair.678 10.1126/science.aao1733 10.1111/1468-0262.00239 10.1609/aaai.v33i01.33014504 10.1609/aaai.v32i1.11796 10.1609/aaai.v27i1.8659 10.1016/0304-3975(94)90181-3 10.1016/0303-2647(95)01551-5 10.1007/s10994-010-5192-9 10.1287/opre.1060.0291 10.1038/nature24270 10.1109/CIG.2018.8490422 10.1007/s10458-016-9352-6 10.1145/203330.203343 10.1613/jair.1.11396 10.1145/1160633.1160776 10.1609/aaai.v30i1.10295 10.1609/aaai.v32i1.11694 10.1162/neco.1991.3.1.79 10.1007/978-3-319-28929-8 10.1177/0278364918755924 10.1016/j.neunet.2014.09.003 10.1145/1150402.1150464 10.1145/1160633.1160770 10.1109/ISIC.1992.225046 10.1109/ICNN.1997.616132 10.1162/0899766053011528 10.24963/ijcai.2018/820 10.1023/A:1008942012299 10.1613/jair.5507 10.1126/science.aam6960 10.1609/aaai.v29i1.9439 10.1007/s10458-013-9222-4 10.1023/A:1007518724497 10.1109/ICRA.2015.7139357 10.1145/301136.301167 10.1126/science.aau6249 10.1007/978-3-319-71682-4_5 10.1287/moor.27.4.819.297 10.1613/jair.613 10.24963/ijcai.2018/79 10.2307/2951492 10.1145/3219819.3219918 10.1007/BFb0040758 10.1609/aaai.v32i1.11492 10.1073/pnas.36.1.48 10.1142/9789812777263_0020 10.1145/1329125.1329434 10.1109/TCIAIG.2012.2186810 10.1016/0097-3165(73)90005-8 10.1613/jair.1579 10.1145/860575.860686 10.1162/089976699300016070 10.1109/TEVC.2012.2208755 10.1109/78.650093 10.1038/nature16961 10.1613/jair.3912 10.1023/A:1007379606734 10.1016/0022-247X(65)90154-X 10.1561/2200000071 10.1007/3-540-52255-7_33 10.1613/jair.2447 10.1057/9780230523371_8 10.1007/3-540-61723-X_967 10.1126/science.7466396 10.1162/evco.1997.5.1.1 10.1007/978-3-642-27645-3_14 10.1145/1143997.1144059 10.1016/j.artint.2018.01.002 10.1613/jair.301 10.1007/s10458-008-9046-9 10.24963/ijcai.2018/813 10.1613/jair.1497 10.1145/3271625 10.1613/jair.433 10.1007/s10458-005-2631-2 10.1609/aaai.v24i1.7529 10.1016/j.artint.2013.05.004 10.1609/aaai.v32i1.11595 10.1287/mnsc.14.3.159 10.1007/s10994-016-5547-y 10.1609/aiide.v15i1.5220 10.1023/A:1010028119149 10.1609/aaai.v31i1.10810 10.1162/neco.1997.9.8.1735 10.1016/S0004-3702(02)00121-2 10.1006/jeth.1996.0014 10.1371/journal.pone.0172395 10.1609/aiide.v15i1.5221 10.1023/A:1007678930559 10.1609/aaai.v24i1.7639 10.1007/978-3-642-27645-3 10.1007/978-3-642-32375-1_2 10.1109/ADPRL.2011.5967363 10.1609/aaai.v33i01.33014213 10.1016/j.artint.2006.02.006 10.1145/1143844.1143906 10.1017/S0269888912000057 10.1016/B978-1-55860-335-6.50027-1 10.1613/jair.5699 10.1007/11564096_32 10.1609/aaai.v33i01.33016079 10.1126/science.1259433 10.1016/0004-3702(95)00103-4 10.1145/1390156.1390199 10.1109/TSMCC.2007.913919 |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media, LLC, part of Springer Nature 2019 Copyright Springer Nature B.V. 2019 |
| Copyright_xml | – notice: Springer Science+Business Media, LLC, part of Springer Nature 2019 – notice: Copyright Springer Nature B.V. 2019 |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s10458-019-09421-1 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-7454 |
| EndPage | 797 |
| ExternalDocumentID | 10_1007_s10458_019_09421_1 |
| GroupedDBID | -59 -5G -BR -EM -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 1N0 1SB 203 23N 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5GY 5VS 67Z 6J9 6NX 6TJ 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFO ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACREN ACSNA ACZOJ ADHHG ADHIR ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADYOE ADZKW AEBTG AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFGCZ AFLOW AFQWF AFWTZ AFYQB AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMTXH AMXSW AMYLF AMYQR AOCGG ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IHE IJ- IKXTQ ITM IWAJR IXC IXE IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV LAK LLZTM M4Y MA- N2Q NB0 NPVJJ NQJWS NU0 O9- O93 O9J OAM OVD P2P P9O PF0 PT4 PT5 QOS R89 R9I RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S27 S3B SAP SCO SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 TEORI TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 YLTOR Z45 Z7R Z7X Z81 Z83 Z88 ZMTXR AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS JQ2 |
| ID | FETCH-LOGICAL-c319t-36981aaeef729ec194cd0f1e7207d82849c6bcb57564449bb70008ccb15a1bec3 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 385 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000491059500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1387-2532 |
| IngestDate | Thu Sep 18 00:00:30 EDT 2025 Sat Nov 29 01:33:08 EST 2025 Tue Nov 18 21:40:21 EST 2025 Fri Feb 21 02:32:29 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Keywords | Multiagent learning Deep reinforcement learning Multiagent systems Survey Multiagent reinforcement learning |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c319t-36981aaeef729ec194cd0f1e7207d82849c6bcb57564449bb70008ccb15a1bec3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-8530-6775 |
| PQID | 2307168851 |
| PQPubID | 2043870 |
| PageCount | 48 |
| ParticipantIDs | proquest_journals_2307168851 crossref_citationtrail_10_1007_s10458_019_09421_1 crossref_primary_10_1007_s10458_019_09421_1 springer_journals_10_1007_s10458_019_09421_1 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-11-01 |
| PublicationDateYYYYMMDD | 2019-11-01 |
| PublicationDate_xml | – month: 11 year: 2019 text: 2019-11-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | Autonomous agents and multi-agent systems |
| PublicationTitleAbbrev | Auton Agent Multi-Agent Syst |
| PublicationYear | 2019 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | ShohamYPowersRGrenagerTIf multi-agent learning is the answer, what is the question?Artificial Intelligence2007171736537723322841168.68493 Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., & Kavukcuoglu, K. (2017). FeUdal networks for hierarchical reinforcement learning. In International conference on machine learning. HarsanyiJCGames with incomplete information played by “Bayesian” players, I–III part I. The basic modelManagement Science19671431591822466490207.51102 Liu, S., Lever, G., Merel, J., Tunyasuvunakool, S., Heess, N., & Graepel, T. (2019). Emergent coordination through competition. In International conference on learning representations. Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations. Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NIPS deep learning workshop. Neller, T. W., & Lanctot, M. (2013). An introduction to counterfactual regret minimization. In Proceedings of model AI assignments, the fourth symposium on educational advances in artificial intelligence (EAAI-2013). SilvaFLCostaAHRA survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research201964645703393255907037594 DarwicheAHuman-level intelligence or animal-like abilities?Communications of the ACM201861105667 BowlingMConvergence and no-regret in multiagent learningAdvances in neural information processing systems2004CanadaVancouver209216 Ciosek, K. A., & Whiteson, S. (2017). Offer: Off-environment reinforcement learning. In Thirty-first AAAI conference on artificial intelligence. Schmidhuber, J. (2015). Critique of Paper by “Deep Learning Conspiracy” (Nature 521 p 436). http://people.idsia.ch/~juergen/deep-learning-conspiracy.html. Omidshafiei, S., Hennes, D., Morrill, D., Munos, R., Perolat, J., Lanctot, M., Gruslys, A., Lespiau, J. B., & Tuyls, K. (2019). Neural replicator dynamics. arXiv e-prints arXiv:1906.00190. Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the 19th international joint conference on artificial intelligence (pp. 817–822). Edinburg, Scotland, UK. Stooke, A., & Abbeel, P. (2018). Accelerated methods for deep reinforcement learning. CoRR arXiv:1803.02811. Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. In International conference on machine learning. Lipton, Z. C., Azizzadenesheli, K., Kumar, A., Li, L., Gao, J., & Deng, L. (2018). Combating reinforcement learning’s Sisyphean curse with intrinsic fear. arXiv:1611.01211v8. Lu, T., Schuurmans, D., & Boutilier, C. (2018). Non-delusional Q-learning and value-iteration. In Advances in neural information processing systems (pp. 9949–9959). Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat games. arXiv:1703.10069 Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 2817–2826). JMLR. org Kaisers, M., & Tuyls, K. (2011). FAQ-learning in matrix games: demonstrating convergence near Nash equilibria, and bifurcation of attractors in the battle of sexes. In AAAI Workshop on Interactive Decision Theory and Game Theory (pp. 309–316). San Francisco, CA, USA. Pesce, E., & Montana, G. (2019). Improving coordination in multi-agent deep reinforcement learning through memory-driven communication. CoRR arXiv:1901.03887. JacobsRAJordanMINowlanSJHintonGEAdaptive mixtures of local expertsNeural Computation1991317987 Srinivasan, S., Lanctot, M., Zambaldi, V., Pérolat, J., Tuyls, K., Munos, R., & Bowling, M. (2018). Actor-critic policy optimization in partially observable multiagent environments. In Advances in neural information processing systems (pp. 3422–3435). Lehman, J., & Stanley, K. O. (2008). Exploiting open-endedness to solve problems through the search for novelty. In ALIFE (pp. 329–336). Melis, G., Dyer, C., & Blunsom, P. (2018). On the state of the art of evaluation in neural language models. In International conference on learning representations. BestGCliffOMPattenTMettuRRFitchRDec-MCTS: Decentralized planning for multi-robot active perceptionThe International Journal of Robotics Research2019382–3316337 Foerster, J. N., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning. KaelblingLPLittmanMLMooreAWReinforcement learning: A surveyJournal of Artificial Intelligence Research19964237285 Song, X., Wang, T., & Zhang, C. (2019). Convergence of multi-agent learning with a finite step size in general-sum games. In 18th International conference on autonomous agents and multiagent systems. McCracken, P., & Bowling, M. (2004) Safe strategies for agent modelling in games. In AAAI fall symposium (pp. 103–110). CrandallJWGoodrichMALearning to compete, coordinate, and cooperate in repeated games using reinforcement learningMachine Learning201182328131431081951237.68142 Bloembergen, D., Kaisers, M., & Tuyls, K. (2010). Lenient frequency adjusted Q-learning. In Proceedings of the 22nd Belgian/Netherlands artificial intelligence conference. MatignonLLaurentGJLe Fort-PiatNIndependent reinforcement learners in cooperative Markov games: A survey regarding coordination problemsKnowledge Engineering Review2012271131 Oliehoek, F. A. (2018). Interactive learning and decision making - foundations, insights & challenges. In International joint conference on artificial intelligence. Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems (pp. 1471–1479). GmytrasiewiczPJDurfeeEHRational coordination in multi-agent environmentsAutonomous Agents and Multi-Agent Systems200034319350 Johnson, M., Hofmann, K., Hutton, T., & Bignell, D. (2016). The Malmo platform for artificial intelligence experimentation. In IJCAI (pp. 4246–4247). Kartal, B., Godoy, J., Karamouzas, I., & Guy, S. J. (2015). Stochastic tree search with useful cycles for patrolling problems. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 1289–1294). IEEE. Papoudakis, G., Christianos, F., Rahman, A., & Albrecht, S. V. (2019). Dealing with non-stationarity in multi-agent deep reinforcement learning. arXiv preprint arXiv:1906.04737. TesauroGTemporal difference learning and TD-GammonCommunications of the ACM19953835868 Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Deep learning and representation learning workshop. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Advances in neural information processing systems. MaatenLvdHintonGVisualizing data using t-SNEJournal of Machine Learning Research20089Nov257926051225.68219 Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in neural information processing systems (pp. 1038–1044). StonePVelosoMMMultiagent systems - a survey from a machine learning perspectiveAutonomous Robots200083345383 Hernandez-Leal, P., & Kaisers, M. (2017). Learning against sequential opponents in repeated stochastic games. In The 3rd multi-disciplinary conference on reinforcement learning and decision making. Ann Arbor. SamothrakisSLucasSRunarssonTRoblesDCoevolving game-playing agents: Measuring performance and intransitivitiesIEEE Transactions on Evolutionary Computation2013172213226 BairdLResidual algorithms: Reinforcement learning with function approximationMachine Learning Proceedings199519953037 François-LavetVHendersonPIslamRBellemareMGPineauJAn introduction to deep reinforcement learningFoundations and Trends® in Machine Learning2018113–421935407039358 LaurentGJMatignonLFort-PiatLThe world of independent learners is not MarkovianInternational Journal of Knowledge-based and Intelligent Engineering Systems20111515564 Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In Advances in neural information processing systems (pp. 1523–1530). Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2018). Unity: A general platform for intelligent agents. CoRR arXiv:1809.02627. Hernandez-LealPZhanYTaylorMESucarLEMunoz de CoteEEfficiently detecting switches against non-stationary opponentsAutonomous Agents and Multi-Agent Systems2017314767789 Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the sixteenth international conference on machine learning (pp. 278–287). CarmelDMarkovitchSIncorporating opponent models into adversary searchAAAI/IAAI19961120125 BartoAGMirolliMBaldassarreGIntrinsic motivation and reinforcement learningIntrinsically motivated learning in natural and artificial systems2013BerlinSpringer1747 Resnick, C., Eldridge, W., Ha, D., Britz, D., Foerster, J., Togelius, J., Cho, K., & Bruna, J. (2018). Pommerman: A multi-agent playground. arXiv:1809.07124. Song, Y., Wang, J., Lukasiewicz, T., Xu, Z., Xu, M., Ding, Z., & Wu, L. (2019). Arena: A general evaluation platform and building toolkit for multi-agent intelligence. CoRR arXiv:1905.08085. Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv:1708.05866v2. Camerer, C. F., Ho, T. H., & 9421_CR189 T Vodopivec (9421_CR340) 2017; 60 9421_CR188 9421_CR187 9421_CR186 9421_CR185 9421_CR183 P Erdös (9421_CR91) 1973; 14 9421_CR181 9421_CR180 PJ Gmytrasiewicz (9421_CR110) 2000; 3 V Mnih (9421_CR221) 2015; 518 DS Bernstein (9421_CR34) 2002; 27 A Darwiche (9421_CR81) 2018; 61 Y Shoham (9421_CR289) 2007; 171 DE Moriarty (9421_CR226) 1999; 11 CF Camerer (9421_CR59) 2004; 119 9421_CR178 9421_CR177 9421_CR298 9421_CR176 9421_CR297 9421_CR175 9421_CR174 9421_CR295 9421_CR173 9421_CR172 9421_CR171 9421_CR292 9421_CR170 P Hernandez-Leal (9421_CR144) 2017; 31 RA Jacobs (9421_CR155) 1991; 3 JF Nash (9421_CR229) 1950; 36 A Nowé (9421_CR233) 2012 L Busoniu (9421_CR56) 2010 D Ernst (9421_CR92) 2005; 6 H de Weerd (9421_CR76) 2013; 199–200 R Hafner (9421_CR128) 2011; 84 9421_CR169 MC Machado (9421_CR211) 2018; 61 M Costa Gomes (9421_CR72) 2001; 69 R Caruana (9421_CR62) 1997; 28 R Rosenthal (9421_CR270) 1979; 86 E Alonso (9421_CR7) 2002; 16 S Mahadevan (9421_CR212) 1992; 55 D Silver (9421_CR291) 2016; 529 9421_CR199 9421_CR198 9421_CR197 S Whiteson (9421_CR351) 2006; 7 9421_CR1 9421_CR196 9421_CR2 9421_CR195 C Szepesvári (9421_CR318) 2010; 4 D Fudenberg (9421_CR103) 1991 9421_CR193 9421_CR192 9421_CR191 9421_CR190 RS Sutton (9421_CR315) 2018 G Best (9421_CR35) 2019; 38 T De Bruin (9421_CR83) 2018; 19 R Powers (9421_CR259) 2007; 67 E Even-Dar (9421_CR94) 2003; 5 V François-Lavet (9421_CR101) 2018; 11 AW Moore (9421_CR223) 1993; 13 9421_CR300 BJ Grosz (9421_CR115) 1996; 86 J Hu (9421_CR149) 2003; 4 FA Oliehoek (9421_CR239) 2008; 32 ML Puterman (9421_CR261) 1994 TI Ahamed (9421_CR4) 2006; 54 D Monderer (9421_CR222) 1996; 68 B Rosman (9421_CR272) 2016; 104 9421_CR5 TW Sandholm (9421_CR277) 1996; 37 9421_CR8 CM Bishop (9421_CR36) 2006 9421_CR9 D Silver (9421_CR293) 2017; 550 9421_CR93 RH Crites (9421_CR74) 1998; 33 S Samothrakis (9421_CR275) 2013; 17 L Panait (9421_CR250) 2008; 9 9421_CR96 9421_CR95 9421_CR98 9421_CR97 9421_CR99 9421_CR201 9421_CR320 S Hochreiter (9421_CR147) 1997; 9 9421_CR80 9421_CR82 J Tsitsiklis (9421_CR329) 1994; 16 KA De Jong (9421_CR85) 2006 JC Harsanyi (9421_CR129) 1967; 14 JK Gupta (9421_CR123) 2017 9421_CR84 9421_CR317 9421_CR86 9421_CR316 9421_CR89 LP Kaelbling (9421_CR164) 1996; 4 9421_CR88 9421_CR314 9421_CR313 9421_CR312 9421_CR311 9421_CR310 CD Rosin (9421_CR271) 1997; 5 9421_CR70 R Bellman (9421_CR33) 1957; 6 D Chakraborty (9421_CR66) 2013; 28 FL Silva (9421_CR290) 2019; 64 A Blum (9421_CR39) 2007 JW Crandall (9421_CR73) 2011; 82 9421_CR309 9421_CR308 9421_CR306 9421_CR75 9421_CR78 9421_CR304 9421_CR77 9421_CR303 9421_CR302 9421_CR79 9421_CR102 9421_CR344 9421_CR343 9421_CR100 9421_CR342 9421_CR220 9421_CR60 JS Shamma (9421_CR287) 2005; 50 K Greff (9421_CR114) 2017; 28 M Bowling (9421_CR42) 2004 L Matignon (9421_CR213) 2012; 27 J Von Neumann (9421_CR341) 1945 BM Lake (9421_CR179) 2016; 40 9421_CR63 9421_CR219 9421_CR65 9421_CR218 9421_CR339 9421_CR64 9421_CR217 9421_CR338 9421_CR67 9421_CR216 9421_CR337 9421_CR215 9421_CR336 9421_CR69 9421_CR335 9421_CR68 9421_CR334 9421_CR332 9421_CR331 L Baird (9421_CR21) 1995; 1995 9421_CR330 C Guestrin (9421_CR120) 2003; 19 J Spencer (9421_CR299) 1994; 131 R Becker (9421_CR28) 2004; 22 M McCloskey (9421_CR214) 1989 M Bowling (9421_CR45) 2002; 136 GW Brown (9421_CR49) 1951; 13 9421_CR52 9421_CR209 K Tuyls (9421_CR333) 2012; 33 9421_CR208 9421_CR54 9421_CR207 9421_CR328 9421_CR53 9421_CR206 9421_CR327 9421_CR205 S Singh (9421_CR294) 2000; 38 9421_CR326 9421_CR204 P Stone (9421_CR305) 2000; 8 C Szepesvári (9421_CR319) 1999; 11 9421_CR58 9421_CR203 S Omidshafiei (9421_CR243) 2019; 9 9421_CR57 9421_CR202 9421_CR124 9421_CR245 9421_CR366 9421_CR244 9421_CR365 9421_CR122 9421_CR364 9421_CR121 9421_CR242 9421_CR363 9421_CR241 9421_CR362 9421_CR240 9421_CR361 9421_CR360 L Panait (9421_CR248) 2005; 11 M Tambe (9421_CR321) 1997; 7 V Conitzer (9421_CR71) 2006; 67 9421_CR48 J Schmidhuber (9421_CR281) 2015; 61 9421_CR41 9421_CR40 9421_CR119 9421_CR118 E Kalai (9421_CR167) 1993; 61 9421_CR117 9421_CR238 9421_CR359 9421_CR44 9421_CR116 9421_CR358 KJ Astrom (9421_CR14) 1965; 10 9421_CR236 9421_CR357 9421_CR46 9421_CR235 9421_CR356 9421_CR113 9421_CR234 9421_CR355 9421_CR112 9421_CR111 9421_CR232 9421_CR353 9421_CR231 9421_CR352 9421_CR230 M Hauskrecht (9421_CR132) 2000; 13 AL Strehl (9421_CR307) 2008; 74 AK Agogino (9421_CR3) 2008; 17 9421_CR37 9421_CR30 9421_CR108 9421_CR31 9421_CR107 9421_CR228 9421_CR349 9421_CR106 9421_CR348 9421_CR105 SP Singh (9421_CR296) 1992; 8 9421_CR104 9421_CR225 9421_CR346 9421_CR345 9421_CR146 9421_CR267 9421_CR145 9421_CR266 9421_CR265 CB Browne (9421_CR51) 2012; 4 9421_CR143 9421_CR264 9421_CR142 9421_CR263 9421_CR141 9421_CR262 9421_CR140 9421_CR260 L Busoniu (9421_CR55) 2008; 38 M Schuster (9421_CR285) 1997; 45 M Bowling (9421_CR43) 2015; 347 R Axelrod (9421_CR15) 1981; 211 M Moravčík (9421_CR224) 2017; 356 9421_CR26 9421_CR29 N Brown (9421_CR50) 2018; 359 GJ Laurent (9421_CR182) 2011; 15 D Carmel (9421_CR61) 1996; 1 M Jaderberg (9421_CR156) 2019; 364 9421_CR23 9421_CR139 9421_CR22 9421_CR138 9421_CR25 9421_CR137 9421_CR258 9421_CR24 9421_CR136 9421_CR257 9421_CR135 9421_CR256 9421_CR134 9421_CR255 9421_CR133 9421_CR254 9421_CR253 SV Albrecht (9421_CR6) 2018; 258 9421_CR131 9421_CR252 9421_CR130 9421_CR251 ME Taylor (9421_CR324) 2009; 10 Thomas G. Dietterich (9421_CR87) 2000 9421_CR16 9421_CR18 9421_CR17 9421_CR19 Ming Tan (9421_CR323) 1993 ML Littman (9421_CR200) 2001; 2 9421_CR10 9421_CR12 9421_CR249 9421_CR11 AE Elo (9421_CR90) 1978 9421_CR127 9421_CR126 9421_CR247 G Tesauro (9421_CR325) 1995; 38 9421_CR368 9421_CR13 9421_CR125 9421_CR246 9421_CR367 9421_CR168 9421_CR288 9421_CR166 9421_CR165 9421_CR286 9421_CR163 9421_CR284 9421_CR162 9421_CR283 9421_CR161 9421_CR282 T Back (9421_CR20) 1996 9421_CR160 9421_CR280 A Tampuu (9421_CR322) 2017; 12 AG Barto (9421_CR27) 2013 D Bloembergen (9421_CR38) 2015; 53 PJ Gmytrasiewicz (9421_CR109) 2005; 24 LJ Lin (9421_CR194) 1992; 8 Lvd Maaten (9421_CR210) 2008; 9 RI Brafman (9421_CR47) 2002; 3 9421_CR159 FA Oliehoek (9421_CR237) 2016 MG Bellemare (9421_CR32) 2013; 47 9421_CR158 9421_CR279 9421_CR157 9421_CR278 9421_CR276 9421_CR154 9421_CR153 9421_CR274 9421_CR152 9421_CR273 9421_CR151 9421_CR150 N Srivastava (9421_CR301) 2014; 15 J Morimoto (9421_CR227) 2005; 17 E Wei (9421_CR347) 2016; 17 Y LeCun (9421_CR184) 2015; 521 RJ Williams (9421_CR354) 1992; 8 9421_CR148 9421_CR269 9421_CR268 (9421_CR350) 2013 |
| References_xml | – reference: PanaitLLukeSCooperative multi-agent learning: The state of the artAutonomous Agents and Multi-Agent Systems2005113387434 – reference: Ciosek, K. A., & Whiteson, S. (2017). Offer: Off-environment reinforcement learning. In Thirty-first AAAI conference on artificial intelligence. – reference: Gencoglu, O., van Gils, M., Guldogan, E., Morikawa, C., Süzen, M., Gruber, M., Leinonen, J., & Huttunen, H. (2019). Hark side of deep learning–from grad student descent to automated machine learning. arXiv preprint arXiv:1904.07633. – reference: Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (pp. 157–163). New Brunswick, NJ, USA. – reference: Oliehoek, F. A., De Jong, E. D., & Vlassis, N. (2006). The parallel Nash memory for asymmetric games. In Proceedings of the 8th annual conference on genetic and evolutionary computation (pp. 337–344). ACM. – reference: Von NeumannJMorgensternOTheory of games and economic behavior1945New YorkBulletin of the American Mathematical Society0063.05930 – reference: RosinCDBelewRKNew methods for competitive coevolutionEvolutionary Computation199751129 – reference: Bull, L., Fogarty, T. C., & Snaith, M. (1995). Evolution in multi-agent systems: Evolving communicating classifier systems for gait in a quadrupedal robot. In Proceedings of the 6th international conference on genetic algorithms (pp. 382–388). Morgan Kaufmann Publishers Inc. – reference: Kretchmar, R. M., & Anderson, C. W. (1997). Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning. In Proceedings of international conference on neural networks (ICNN’97) (Vol. 2, pp. 834–837). IEEE. – reference: Lehman, J., & Stanley, K. O. (2008). Exploiting open-endedness to solve problems through the search for novelty. In ALIFE (pp. 329–336). – reference: MaatenLvdHintonGVisualizing data using t-SNEJournal of Machine Learning Research20089Nov257926051225.68219 – reference: MoravčíkMSchmidMBurchNLisýVMorrillDBardNDavisTWaughKJohansonMBowlingMDeepStack: Expert-level artificial intelligence in heads-up no-limit pokerScience2017356633750851336769531403.68202 – reference: Riemer, M., Cases, I., Ajemian, R., Liu, M., Rish, I., Tu, Y., & Tesauro, G. (2018). Learning to learn without forgetting by maximizing transfer and minimizing interference. CoRR arXiv:1810.11910. – reference: Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of 17th international conference on autonomous agents and multiagent systems. Stockholm, Sweden. – reference: Yang, Y., Hao, J., Sun, M., Wang, Z., Fan, C., & Strbac, G. (2018). Recurrent deep multiagent Q-learning for autonomous brokers in smart grid. In Proceedings of the twenty-seventh international joint conference on artificial intelligence. Stockholm, Sweden. – reference: GuestrinCKollerDParrRVenkataramanSEfficient solution algorithms for factored MDPsJournal of Artificial Intelligence Research20031939946821156281026.68125 – reference: Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. In Proceedings of 17th international conference on autonomous agents and multiagent systems (pp. 322–328). Williamstown, MA, USA. – reference: SuttonRSBartoAGReinforcement learning: An introduction20182CambridgeMIT Press1407.68009 – reference: Van Seijen, H., Van Hasselt, H., Whiteson, S., & Wiering, M. (2009). A theoretical and empirical analysis of Expected Sarsa. In IEEE symposium on adaptive dynamic programming and reinforcement learning (pp. 177–184). Nashville, TN, USA. – reference: Zinkevich, M., Greenwald, A., & Littman, M. L. (2006). Cyclic equilibria in Markov games. In Advances in neural information processing systems (pp. 1641–1648). – reference: Tsitsiklis, J. N., & Van Roy, B. (1997). Analysis of temporal-diffference learning with function approximation. In Advances in neural information processing systems (pp. 1075–1081). – reference: McCracken, P., & Bowling, M. (2004) Safe strategies for agent modelling in games. In AAAI fall symposium (pp. 103–110). – reference: Tucker, G., Bhupatiraju, S., Gu, S., Turner, R. E., Ghahramani, Z., & Levine, S. (2018). The mirage of action-dependent baselines in reinforcement learning. In International conference on machine learning. – reference: DietterichThomas G.Ensemble Methods in Machine LearningMultiple Classifier Systems2000Berlin, HeidelbergSpringer Berlin Heidelberg115 – reference: SinghSPTransfer of learning by composing solutions of elemental sequential tasksMachine Learning199283–43233390772.68073 – reference: Guss, W. H., Codel, C., Hofmann, K., Houghton, B., Kuno, N., Milani, S., Mohanty, S. P., Liebana, D. P., Salakhutdinov, R., Topin, N., Veloso, M., & Wang, P. (2019). The MineRL competition on sample efficient reinforcement learning using human priors. CoRR arXiv:1904.10079. – reference: MorimotoJDoyaKRobust reinforcement learningNeural Computation20051723353592176304 – reference: Frank, J., Mannor, S., & Precup, D. (2008). Reinforcement learning in the presence of rare events. In Proceedings of the 25th international conference on machine learning (pp. 336–343). ACM. – reference: Greenwald, A., & Hall, K. (2003). Correlated Q-learning. In Proceedings of 17th international conference on autonomous agents and multiagent systems (pp. 242–249). Washington, DC, USA. – reference: Khadka, S., Majumdar, S., & Tumer, K. (2019). Evolutionary reinforcement learning for sample-efficient multiagent coordination. arXiv e-prints arXiv:1906.07315. – reference: AlbrechtSVStonePAutonomous agents modelling other agents: A comprehensive survey and open problemsArtificial Intelligence20182586695377115806887463 – reference: HafnerRRiedmillerMReinforcement learning in feedback controlMachine Learning2011841–21371693108221 – reference: Kartal, B., Godoy, J., Karamouzas, I., & Guy, S. J. (2015). Stochastic tree search with useful cycles for patrolling problems. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 1289–1294). IEEE. – reference: BusoniuLBabuskaRDe SchutterBSrinivasanDJainLCMulti-agent reinforcement learning: An overviewInnovations in multi-agent systems and applications - 12010BerlinSpringer183221 – reference: Melis, G., Dyer, C., & Blunsom, P. (2018). On the state of the art of evaluation in neural language models. In International conference on learning representations. – reference: Stooke, A., & Abbeel, P. (2018). Accelerated methods for deep reinforcement learning. CoRR arXiv:1803.02811. – reference: BusoniuLBabuskaRDe SchutterBA comprehensive survey of multiagent reinforcement learningIEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews)2008382156172 – reference: Zhao, J., Qiu, G., Guan, Z., Zhao, W., & He, X. (2018). Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1021–1030). ACM. – reference: Beeching, E., Wolf, C., Dibangoye, J., & Simonin, O. (2019). Deep reinforcement learning on a budget: 3D Control and reasoning without a supercomputer. CoRR arXiv:1904.01806. – reference: Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., & Simonyan, K., et al. (2017). Population based training of neural networks. arXiv:1711.09846. – reference: BernsteinDSGivanRImmermanNZilbersteinSThe complexity of decentralized control of Markov decision processesMathematics of Operations Research200227481984019391791082.90593 – reference: BlumAMonsourYNisanNLearning, regret minimization, and equilibria. Chap. 4Algorithmic game theory2007CambridgeCambridge University Press – reference: BrafmanRITennenholtzMR-max-a general polynomial time algorithm for near-optimal reinforcement learningJournal of Machine Learning Research20023Oct21323119713371088.68694 – reference: Sculley, D., Snoek, J., Wiltschko, A., & Rahimi, A. (2018). Winner’s curse? On pace, progress, and empirical rigor. In ICLR workshop. – reference: Oliehoek, F. A., Whiteson, S., & Spaan, M. T. (2013). Approximate solutions for factored Dec-POMDPs with many agents. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems (pp. 563–570). International Foundation for Autonomous Agents and Multiagent Systems. – reference: Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., & Mordatch, I. (2018). Emergent complexity via multi-agent competition. In International conference on machine learning. – reference: PanaitLTuylsKLukeSTheoretical advantages of lenient learners: An evolutionary game theoretic perspectiveJMLR20089Mar42345724172411225.68204 – reference: TaylorMEStonePTransfer learning for reinforcement learning domains: A surveyThe Journal of Machine Learning Research2009101633168525348741235.68196 – reference: Tumer, K., & Agogino, A. (2007). Distributed agent-based air traffic flow management. In Proceedings of the 6th international conference on autonomous agents and multiagent systems. Honolulu, Hawaii. – reference: BairdLResidual algorithms: Reinforcement learning with function approximationMachine Learning Proceedings199519953037 – reference: Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., & Bengio, Y. (2013). An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv:1312.6211 – reference: Lu, T., Schuurmans, D., & Boutilier, C. (2018). Non-delusional Q-learning and value-iteration. In Advances in neural information processing systems (pp. 9949–9959). – reference: Bard, N., Foerster, J. N., Chandar, S., Burch, N., Lanctot, M., & Song, H. F., et al. (2019). The Hanabi challenge: A new frontier for AI research. arXiv:1902.00506. – reference: KaelblingLPLittmanMLMooreAWReinforcement learning: A surveyJournal of Artificial Intelligence Research19964237285 – reference: François-LavetVHendersonPIslamRBellemareMGPineauJAn introduction to deep reinforcement learningFoundations and Trends® in Machine Learning2018113–421935407039358 – reference: Arjona-Medina, J. A., Gillhofer, M., Widrich, M., Unterthiner, T., & Hochreiter, S. (2018). RUDDER: Return decomposition for delayed rewards. arXiv:1806.07857. – reference: Nguyen, T. T., Nguyen, N. D., & Nahavandi, S. (2018). Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications. arXiv preprint arXiv:1812.11794. – reference: Hernandez-Leal, P., & Kaisers, M. (2017). Towards a fast detection of opponents in repeated stochastic games. In G. Sukthankar, & J. A. Rodriguez-Aguilar (Eds.) Autonomous agents and multiagent systems: AAMAS 2017 Workshops, Best Papers, Sao Paulo, Brazil, 8–12 May, 2017, Revised selected papers (pp. 239–257). – reference: AlonsoED’invernoMKudenkoDLuckMNobleJLearning in multi-agent systemsKnowledge Engineering Review2002160318 – reference: De Hauwere, Y. M., Vrancx, P., & Nowe, A. (2010). Learning multi-agent state space representations. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (pp. 715–722). Toronto, Canada. – reference: Lockhart, E., Lanctot, M., Pérolat, J., Lespiau, J., Morrill, D., Timbers, F., & Tuyls, K. (2019). Computing approximate equilibria in sequential adversarial games by exploitability descent. CoRR arXiv:1903.05614. – reference: De BruinTKoberJTuylsKBabuškaRExperience selection in deep reinforcement learning for controlThe Journal of Machine Learning Research2018191347402386241606982300 – reference: LakeBMUllmanTDTenenbaumJGershmanSBuilding machines that learn and think like peopleBehavioral and Brain Sciences201640172 – reference: Wolpert, D. H., Wheeler, K. R., & Tumer, K. (1999). General principles of learning-based multi-agent systems. In Proceedings of the third international conference on autonomous agents. – reference: Heinrich, J., Lanctot, M., & Silver, D. (2015). Fictitious self-play in extensive-form games. In International conference on machine learning (pp. 805–813). – reference: Lowe, R., Foerster, J., Boureau, Y. L., Pineau, J., & Dauphin, Y. (2019). On the pitfalls of measuring emergent communication. In 18th international conference on autonomous agents and multiagent systems. – reference: Achiam, J., Knight, E., & Abbeel, P. (2019). Towards characterizing divergence in deep Q-learning. CoRR arXiv:1903.08894. – reference: Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in neural information processing systems (pp. 1038–1044). – reference: Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., & Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th international conference on autonomous agents and multiagent systems (Vol. 2, pp. 761–768). International Foundation for Autonomous Agents and Multiagent Systems. – reference: Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Advances in neural information processing systems. – reference: WeissGMultiagent systems. Intelligent robotics and autonomous agents series20132Cambridge, MAMIT Press – reference: Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean field multi-agent reinforcement learning. In Proceedings of the 35th international conference on machine learning. Stockholm Sweden. – reference: Hernandez-LealPZhanYTaylorMESucarLEMunoz de CoteEEfficiently detecting switches against non-stationary opponentsAutonomous Agents and Multi-Agent Systems2017314767789 – reference: Amato, C., & Oliehoek, F. A. (2015). Scalable planning and learning for multiagent POMDPs. In AAAI (pp. 1995–2002). – reference: Nagarajan, P., Warnell, G., & Stone, P. (2018). Deterministic implementations for reproducibility in deep reinforcement learning. arXiv:1809.05676 – reference: RosmanBHawaslyMRamamoorthySBayesian policy reuseMachine Learning2016104199127351328506825503 – reference: CaruanaRMultitask learningMachine Learning199728141752765508 – reference: CritesRHBartoAGElevator group control using multiple reinforcement learning agentsMachine Learning1998332–32352620913.68174 – reference: Stimpson, J. L., & Goodrich, M. A. (2003). Learning to cooperate in a social dilemma: A satisficing approach to bargaining. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 728–735). – reference: StonePVelosoMMMultiagent systems - a survey from a machine learning perspectiveAutonomous Robots200083345383 – reference: Camerer, C. F., Ho, T. H., & Chong, J. K. (2004). Behavioural game theory: Thinking, learning and teaching. In Advances in understanding strategic behavior (pp. 120–180). New York. – reference: Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., & Russell, S. (2019). Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In AAAI conference on artificial intelligence. – reference: Kartal, B., Nunes, E., Godoy, J., & Gini, M. (2016). Monte Carlo tree search with branch and bound for multi-robot task allocation. In The IJCAI-16 workshop on autonomous mobile service robots. – reference: Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 535–541). ACM. – reference: Gupta, J. K., Egorov, M., & Kochenderfer, M. J. (2017). Cooperative Multi-agent Control using deep reinforcement learning. In Adaptive learning agents at AAMAS. Sao Paulo. – reference: Hernandez-Leal, P., Kartal, B., & Taylor, M. E. (2019). Agent modeling as auxiliary task for deep reinforcement learning. In AAAI conference on artificial intelligence and interactive digital entertainment. – reference: BellmanRA Markovian decision processJournal of Mathematics and Mechanics195765679684918590078.34101 – reference: LittmanMLValue-function reinforcement learning in Markov gamesCognitive Systems Research2001215566 – reference: Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Advances in neural information processing systems (pp. 2244–2252). – reference: JacobsRAJordanMINowlanSJHintonGEAdaptive mixtures of local expertsNeural Computation1991317987 – reference: Kok, J. R., & Vlassis, N. (2004). Sparse cooperative Q-learning. In Proceedings of the twenty-first international conference on Machine learning (p. 61). ACM. – reference: BackTEvolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms1996OxfordOxford University Press0877.68060 – reference: GreffKSrivastavaRKKoutnikJSteunebrinkBRSchmidhuberJLSTM: A search space odysseyIEEE Transactions on Neural Networks and Learning Systems20172810222222323709742 – reference: Pérez-Liébana, D., Hofmann, K., Mohanty, S. P., Kuno, N., Kramer, A., Devlin, S., Gaina, R. D., & Ionita, D. (2019). The multi-agent reinforcement learning in Malmö (MARLÖ) competition. CoRR arXiv:1901.08129. – reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv:1312.5602v1. – reference: Wunder, M., Littman, M. L., & Babes, M. (2010). Classes of multiagent Q-learning dynamics with epsilon-greedy exploration. In Proceedings of the 35th international conference on machine learning (pp. 1167–1174). Haifa, Israel. – reference: Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence. – reference: Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In ICML. – reference: Zheng, Y., Meng, Z., Hao, J., Zhang, Z., Yang, T., & Fan, C. (2018). A deep bayesian policy reuse approach against non-stationary agents. In Advances in Neural Information Processing Systems (pp. 962–972). – reference: Kakade, S. M. (2002). A natural policy gradient. In Advances in neural information processing systems (pp. 1531–1538). – reference: Leibo, J. Z., Zambaldi, V., Lanctot, M., & Marecki, J. (2017). Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th conference on autonomous agents and multiagent systems. Sao Paulo. – reference: Gao, C., Kartal, B., Hernandez-Leal, P., & Taylor, M. E. (2019). On hard exploration for reinforcement learning: A case study in pommerman. In AAAI conference on artificial intelligence and interactive digital entertainment. – reference: Lyle, C., Castro, P. S., & Bellemare, M. G. (2019). A comparative analysis of expected and distributional reinforcement learning. In Thirty-third AAAI conference on artificial intelligence. – reference: Bloembergen, D., Kaisers, M., & Tuyls, K. (2010). Lenient frequency adjusted Q-learning. In Proceedings of the 22nd Belgian/Netherlands artificial intelligence conference. – reference: Yang, T., Hao, J., Meng, Z., Zhang, C., & Zheng, Y. Z. Z. (2019). Towards efficient detection and optimal response against sophisticated opponents. In IJCAI. – reference: Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat games. arXiv:1703.10069 – reference: JaderbergMCzarneckiWMDunningIMarrisLLeverGCastañedaAGBeattieCRabinowitzNCMorcosASRudermanASonneratNGreenTDeasonLLeiboJZSilverDHassabisDKavukcuogluKGraepelTHuman-level performance in 3d multiplayer games with population-based reinforcement learningScience20193646443859865344460510.1126/science.aau6249 – reference: Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2018). Unity: A general platform for intelligent agents. CoRR arXiv:1809.02627. – reference: Neller, T. W., & Lanctot, M. (2013). An introduction to counterfactual regret minimization. In Proceedings of model AI assignments, the fourth symposium on educational advances in artificial intelligence (EAAI-2013). – reference: He, H., Boyd-Graber, J., Kwok, K., Daume, H. (2016). Opponent modeling in deep reinforcement learning. In 33rd international conference on machine learning (pp. 2675–2684). – reference: Palmer, G., Savani, R., & Tuyls, K. (2019). Negative update intervals in deep multi-agent reinforcement learning. In 18th International conference on autonomous agents and multiagent systems. – reference: Stone, P., Kaminka, G., Kraus, S., & Rosenschein, J. S. (2010). Ad Hoc autonomous agent teams: Collaboration without pre-coordination. In 32nd AAAI conference on artificial intelligence (pp. 1504–1509). Atlanta, Georgia, USA. – reference: AhamedTIBorkarVSJunejaSAdaptive importance sampling technique for markov chains using stochastic approximationOperations Research200654348950422329751167.60343 – reference: Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. – reference: Wiering, M., & van Otterlo, M. (Eds.) (2012). Reinforcement learning. Adaptation, learning, and optimization (Vol. 12). Springer-Verlag Berlin Heidelberg. – reference: Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., & de Freitas, N. (2016). Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224. – reference: Such, F. P., Madhavan, V., Conti, E., Lehman, J., Stanley, K. O., & Clune, J. (2017). Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR arXiv:1712.06567. – reference: Littman, M. L., & Stone, P. (2001). Implicit negotiation in repeated games. In ATAL ’01: revised papers from the 8th international workshop on intelligent agents VIII. – reference: Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient multi-agent deep reinforcement learning. In International conference on autonomous agents and multiagent systems. – reference: McCloskeyMCohenNJBowerGHCatastrophic interference in connectionist networks: The sequential learning problemPsychology of learning and motivation1989AmsterdamElsevier109165 – reference: Weinberg, M., & Rosenschein, J. S. (2004). Best-response multiagent learning in non-stationary environments. In Proceedings of the 3rd international conference on autonomous agents and multiagent systems (pp. 506–513). New York, NY, USA. – reference: Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2017). Counterfactual multi-agent policy gradients. In 32nd AAAI conference on artificial intelligence. – reference: BestGCliffOMPattenTMettuRRFitchRDec-MCTS: Decentralized planning for multi-robot active perceptionThe International Journal of Robotics Research2019382–3316337 – reference: FudenbergDTiroleJGame theory1991CambridgeThe MIT Press1339.91001 – reference: Raileanu, R., Denton, E., Szlam, A., & Fergus, R. (2018). Modeling others using oneself in multi-agent reinforcement learning. In International conference on machine learning. – reference: TambeMTowards flexible teamworkJournal of Artificial Intelligence Research1997783124 – reference: Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in neural information processing systems (pp. 6379–6390). – reference: AxelrodRHamiltonWDThe evolution of cooperationScience198121127139013966867471225.92037 – reference: Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2018). Deep reinforcement learning that matters. In 32nd AAAI conference on artificial intelligence. – reference: Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems, pp. 369–376. – reference: de Cote, E. M., Lazaric, A., & Restelli, M. (2006). Learning to cooperate in multi-agent social dilemmas. In Proceedings of the 5th international conference on autonomous agents and multiagent systems (pp. 783–785). Hakodate, Hokkaido, Japan. – reference: Du, Y., Czarnecki, W. M., Jayakumar, S. M., Pascanu, R., & Lakshminarayanan, B. (2018). Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224. – reference: Andre, D., Friedman, N., & Parr, R. (1998). Generalized prioritized sweeping. In Advances in neural information processing systems (pp. 1001–1007). – reference: Open AI Five. (2018). [Online]. Retrieved September 7, 2018, https://blog.openai.com/openai-five. – reference: Foerster, J. N., Assael, Y. M., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Advances in neural information processing systems (pp. 2145–2153). – reference: OmidshafieiSPapadimitriouCPiliourasGTuylsKRowlandMLespiauJBCzarneckiWMLanctotMPerolatJMunosRα\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-rank: Multi-agent evaluation by evolutionScientific Reports201999937 – reference: Kim, W., Cho, M., & Sung, Y. (2019). Message-dropout: An efficient training method for multi-agent deep reinforcement learning. In 33rd AAAI conference on artificial intelligence. – reference: MnihVKavukcuogluKSilverDRusuAAVenessJBellemareMGGravesARiedmillerMFidjelandAKOstrovskiGPetersenSBeattieCSadikAAntonoglouIKingHKumaranDWierstraDLeggSHassabisDHuman-level control through deep reinforcement learningNature20155187540529533 – reference: Konda, V. R., & Tsitsiklis, J. (2000). Actor-critic algorithms. In Advances in neural information processing systems. – reference: SrivastavaNHintonGKrizhevskyASutskeverISalakhutdinovRDropout: a simple way to prevent neural networks from overfittingThe Journal of Machine Learning Research20141511929195832315921318.68153 – reference: BishopCMPattern recognition and machine learning2006BerlinSpringer1107.68072 – reference: Isele, D., & Cosgun, A. (2018). Selective experience replay for lifelong learning. In Thirty-second AAAI conference on artificial intelligence. – reference: Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C. (2008). Regret minimization in games with incomplete information. In Advances in neural information processing systems (pp. 1729–1736). – reference: LeCunYBengioYHintonGDeep learningNature20155217553436 – reference: Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv:1708.05866v2. – reference: MachadoMCBellemareMGTalvitieEVenessJHausknechtMBowlingMRevisiting the arcade learning environment: Evaluation protocols and open problems for general agentsJournal of Artificial Intelligence Research201861523562378603106865755 – reference: GroszBJKrausSCollaborative plans for complex group actionArtificial Intelligence19968622693571420033 – reference: Johanson, M., Zinkevich, M. A., & Bowling, M. (2007). Computing robust counter-strategies. In Advances in neural information processing systems (pp. 721–728). Vancouver, BC, Canada. – reference: Torrado, R. R., Bontrager, P., Togelius, J., Liu, J., & Perez-Liebana, D. (2018). Deep reinforcement learning for general video game AI. arXiv:1806.02448 – reference: BellemareMGNaddafYVenessJBowlingMThe arcade learning environment: An evaluation platform for general agentsJournal of Artificial Intelligence Research201347253279 – reference: HarsanyiJCGames with incomplete information played by “Bayesian” players, I–III part I. The basic modelManagement Science19671431591822466490207.51102 – reference: Hernandez-Leal, P., Kaisers, M., Baarslag, T., & Munoz de Cote, E. (2017). A survey of learning in multiagent environments—dealing with non-stationarity. arXiv:1707.09183. – reference: ErnstDGeurtsPWehenkelLTree-based batch mode reinforcement learningJournal of Machine Learning Research20056Apr50355622498301222.68193 – reference: Steckelmacher, D., Roijers, D. M., Harutyunyan, A., Vrancx, P., Plisnier, H., & Nowé, A. (2018). Reinforcement learning in pomdps with memoryless options and option-observation initiation sets. In Thirty-second AAAI conference on artificial intelligence. – reference: Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., & Kautz, J. (2017). Reinforcement learning through asynchronous advantage actor-critic on a GPU. In International conference on learning representations. – reference: Agogino, A. K., & Tumer, K. (2004). Unifying temporal and structural credit assignment problems. In Proceedings of 17th international conference on autonomous agents and multiagent systems. – reference: ShammaJSArslanGDynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibriaIEEE Transactions on Automatic Control200550331232721230931366.91028 – reference: Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems (pp. 1471–1479). – reference: Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. In Thirtieth AAAI conference on artificial intelligence. – reference: Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310–1318). – reference: CamererCFHoTHChongJKA cognitive hierarchy model of gamesThe Quarterly Journal of Economics200411938611074.91503 – reference: TanMingMulti-Agent Reinforcement Learning: Independent vs. Cooperative AgentsMachine Learning Proceedings 19931993330337 – reference: RosenthalRThe file drawer problem and tolerance for null resultsPsychological Bulletin1979863638 – reference: Suau de Castro, M., Congeduti, E., Starre, R. A., Czechowski, A., & Oliehoek, F. A. (2019). Influence-based abstraction in deep reinforcement learning. In Adaptive, learning agents workshop. – reference: van Hasselt, H., Doron, Y., Strub, F., Hessel, M., Sonnerat, N., & Modayil, J. (2018). Deep reinforcement learning and the deadly triad. CoRR arXiv:1812.02648. – reference: Hong, Z. W., Su, S. Y., Shann, T. Y., Chang, Y. H., & Lee, C. Y. (2018). A deep policy inference Q-network for multi-agent systems. In International conference on autonomous agents and multiagent systems. – reference: Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In Proceedings of the 19th international joint conference on artificial intelligence (pp. 817–822). Edinburg, Scotland, UK. – reference: Foerster, J. N., Nardelli, N., Farquhar, G., Afouras, T., Torr, P. H. S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning. – reference: BrownNSandholmTSuperhuman AI for heads-up no-limit poker: Libratus beats top professionalsScience2018359637441842437514621415.68163 – reference: Collaboration & Credit Principles, How can we be good stewards of collaborative trust? (2019). [Online]. Retrieved May 31, 2019, http://colah.github.io/posts/2019-05-Collaboration/index.html. – reference: Mordatch, I., & Abbeel, P. (2018). Emergence of grounded compositional language in multi-agent populations. In Thirty-second AAAI conference on artificial intelligence. – reference: Ortega, P. A., & Legg, S. (2018). Modeling friends and foes. arXiv:1807.00196 – reference: BrowneCBPowleyEWhitehouseDLucasSMCowlingPIRohlfshagenPTavenerSPerezDSamothrakisSColtonSA survey of Monte Carlo tree search methodsIEEE Transactions on Computational Intelligence and AI in Games201241143 – reference: Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Deep learning and representation learning workshop. – reference: Do I really have to cite an arXiv paper? (2017). [Online]. Retrieved May 21, 2019, http://approximatelycorrect.com/2017/08/01/do-i-have-to-cite-arxiv-paper/. – reference: Zhang, C., & Lesser, V. (2010). Multi-agent learning with policy prediction. In Twenty-fourth AAAI conference on artificial intelligence. – reference: Johanson, M., Bard, N., Burch, N., & Bowling, M. (2012). Finding optimal abstract strategies in extensive-form games. In Twenty-sixth AAAI conference on artificial intelligence. – reference: Omidshafiei, S., Hennes, D., Morrill, D., Munos, R., Perolat, J., Lanctot, M., Gruslys, A., Lespiau, J. B., & Tuyls, K. (2019). Neural replicator dynamics. arXiv e-prints arXiv:1906.00190. – reference: Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the seventeenth international conference on machine learning. – reference: Deep reinforcement learning: Pong from pixels. (2016). [Online]. Retrieved May 7, 2019, https://karpathy.github.io/2016/05/31/rl/. – reference: Johanson, M., Waugh, K., Bowling, M., & Zinkevich, M. (2011). Accelerating best response calculation in large extensive games. In Twenty-second international joint conference on artificial intelligence. – reference: Lazaridou, A., Peysakhovich, A., & Baroni, M. (2017). Multi-agent cooperation and the emergence of (natural) language. In International conference on learning representations. – reference: Castellini, J., Oliehoek, F. A., Savani, R., & Whiteson, S. (2019). The representational capacity of action-value networks for multi-agent reinforcement learning. In 18th International conference on autonomous agents and multiagent systems. – reference: Kulkarni, T. D., Narasimhan, K., Saeedi, A., & Tenenbaum, J. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in neural information processing systems (pp. 3675–3683). – reference: ShohamYPowersRGrenagerTIf multi-agent learning is the answer, what is the question?Artificial Intelligence2007171736537723322841168.68493 – reference: Todorov, E., Erez, T., & Tassa, Y. (2012). MuJoCo - A physics engine for model-based control. In Intelligent robots and systems( pp. 5026–5033). – reference: BloembergenDTuylsKHennesDKaisersMEvolutionary dynamics of multi-agent learning: A surveyJournal of Artificial Intelligence Research20155365969733895661336.68210 – reference: Azizzadenesheli, K. (2019). Maybe a few considerations in reinforcement learning research? In Reinforcement learning for real life workshop. – reference: Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456). – reference: Bono, G., Dibangoye, J. S., Matignon, L., Pereyron, F., & Simonin, O. (2018). Cooperative multi-agent policy gradient. In European conference on machine learning. – reference: Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., & Silver, D. (2019). AlphaStar: Mastering the real-time strategy game StarCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/ – reference: Iba, H. (1996). Emergent cooperation for multiple agents using genetic programming. In International conference on parallel problem solving from nature (pp. 32–41). Springer. – reference: Song, Y., Wang, J., Lukasiewicz, T., Xu, Z., Xu, M., Ding, Z., & Wu, L. (2019). Arena: A general evaluation platform and building toolkit for multi-agent intelligence. CoRR arXiv:1905.08085. – reference: Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In Advances in neural information processing systems (pp. 1523–1530). – reference: Leibo, J. Z., Hughes, E., Lanctot, M., & Graepel, T. (2019). Autocurricula and the emergence of innovation from social interaction: A manifesto for multi-agent intelligence research. CoRR arXiv:1903.00742. – reference: BowlingMConvergence and no-regret in multiagent learningAdvances in neural information processing systems2004CanadaVancouver209216 – reference: Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 1352–1361). – reference: Hernandez-Leal, P., & Kaisers, M. (2017). Learning against sequential opponents in repeated stochastic games. In The 3rd multi-disciplinary conference on reinforcement learning and decision making. Ann Arbor. – reference: BrownGWIterative solution of games by fictitious playActivity Analysis of Production and Allocation1951131374376562650045.09902 – reference: Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C., Torr, P. H. S., Foerster, J. N., & Whiteson, S. (2019). The StarCraft multi-agent challenge. CoRR arXiv:1902.04043. – reference: DarwicheAHuman-level intelligence or animal-like abilities?Communications of the ACM201861105667 – reference: SinghSJaakkolaTLittmanMLSzepesváriCConvergence results for single-step on-policy reinforcement-learning algorithmsMachine Learning20003832873080954.68127 – reference: TsitsiklisJAsynchronous stochastic approximation and Q-learningMachine Learning19941631852020820.68105 – reference: Forde, J. Z., & Paganini, M. (2019). The scientific method in the science of machine learning. In ICLR debugging machine learning models workshop. – reference: Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI gym. arXiv preprint arXiv:1606.01540. – reference: Rashid, T., Samvelyan, M., de Witt, C. S., Farquhar, G., Foerster, J. N., & Whiteson, S. (2018). QMIX - monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning. – reference: Rusu, A. A., Colmenarejo, S. G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., & Hadsell, R. (2016). Policy distillation. In International conference on learning representations. – reference: Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning. – reference: Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. In Proceedings of the 25th international conference on Machine learning (pp. 664–671). ACM. – reference: Riedmiller, M. (2005). Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In European conference on machine learning (pp. 317–328). Springer. – reference: Hausknecht, M., & Stone, P. (2015). Deep recurrent Q-learning for partially observable MDPs. In International conference on learning representations. – reference: GuptaJKEgorovMKochenderferMSukthankarGRodriguez-AguilarJACooperative multi-agent control using deep reinforcement learningAutonomous agents and multiagent systems2017ChamSpringer6683 – reference: TuylsKWeissGMultiagent learning: Basics, challenges, and prospectsAI Magazine20123334152 – reference: CarmelDMarkovitchSIncorporating opponent models into adversary searchAAAI/IAAI19961120125 – reference: Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2018). Learning with opponent-learning awareness. In Proceedings of 17th international conference on autonomous agents and multiagent systems. Stockholm, Sweden. – reference: Srinivasan, S., Lanctot, M., Zambaldi, V., Pérolat, J., Tuyls, K., Munos, R., & Bowling, M. (2018). Actor-critic policy optimization in partially observable multiagent environments. In Advances in neural information processing systems (pp. 3422–3435). – reference: de WeerdHVerbruggeRVerheijBHow much does it help to know what she knows you know? An agent-based simulation studyArtificial Intelligence2013199–200C679230795661284.68567 – reference: BowlingMBurchNJohansonMTammelinOHeads-up limit hold’em poker is solvedScience20153476218145149 – reference: Walsh, W. E., Das, R., Tesauro, G., & Kephart, J. O. (2002). Analyzing complex strategic interactions in multi-agent systems. In AAAI-02 workshop on game-theoretic and decision-theoretic agents (pp. 109–118). – reference: Papoudakis, G., Christianos, F., Rahman, A., & Albrecht, S. V. (2019). Dealing with non-stationarity in multi-agent deep reinforcement learning. arXiv preprint arXiv:1906.04737. – reference: Schulman, J., Abbeel, P., & Chen, X. (2017) Equivalence between policy gradients and soft Q-learning. CoRR arXiv:1704.06440. – reference: LinLJSelf-improving reactive agents based on reinforcement learning, planning and teachingMachine Learning199283–4293321 – reference: Suddarth, S. C., & Kergosien, Y. (1990). Rule-injection hints as a means of improving network performance and learning time. In Neural networks (pp. 120–129). Springer. – reference: CrandallJWGoodrichMALearning to compete, coordinate, and cooperate in repeated games using reinforcement learningMachine Learning201182328131431081951237.68142 – reference: SamothrakisSLucasSRunarssonTRoblesDCoevolving game-playing agents: Measuring performance and intransitivitiesIEEE Transactions on Evolutionary Computation2013172213226 – reference: Zahavy, T., Ben-Zrihem, N., & Mannor, S. (2016). Graying the black box: Understanding DQNs. In International conference on machine learning (pp. 1899–1908). – reference: Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. In International conference on machine learning. – reference: Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfitting in empirical reinforcement learning. In 2011 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL) (pp. 120–127). IEEE. – reference: Hernandez-Leal, P., Taylor, M. E., Rosman, B., Sucar, L. E., & Munoz de Cote, E. (2016). Identifying and tracking switching, non-stationary opponents: A Bayesian approach. In Multiagent interaction without prior coordination workshop at AAAI. Phoenix, AZ, USA. – reference: AstromKJOptimal control of Markov processes with incomplete state informationJournal of Mathematical Analysis and Applications19651011742051735700137.35803 – reference: Cassandra, A. R. (1998). Exact and approximate algorithms for partially observable Markov decision processes. Ph.D. thesis, Computer Science Department, Brown University. – reference: Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. – reference: Capture the Flag: The emergence of complex cooperative agents. (2018). [Online]. Retrieved September 7, 2018, https://deepmind.com/blog/capture-the-flag/ . – reference: GmytrasiewiczPJDoshiPA framework for sequential planning in multiagent settingsJournal of Artificial Intelligence Research200524149791080.68664 – reference: Kartal, B., Hernandez-Leal, P., & Taylor, M. E. (2019). Using Monte Carlo tree search as a demonstrator within asynchronous deep RL. In AAAI workshop on reinforcement learning in games. – reference: SandholmTWCritesRHMultiagent reinforcement learning in the iterated prisoner’s dilemmaBiosystems1996371–2147166 – reference: SchusterMPaliwalKKBidirectional recurrent neural networksIEEE Transactions on Signal Processing1997451126732681 – reference: WeiELukeSLenient learning in independent-learner stochastic cooperative gamesJournal of Machine Learning Research20161714235171071360.68720 – reference: Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 2817–2826). JMLR. org – reference: Herbrich, R., Minka, T., & Graepel, T. (2007). TrueSkillTM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{{\rm TM}}$$\end{document}: a Bayesian skill rating system. In Advances in neural information processing systems (pp. 569–576). – reference: Clary, K., Tosch, E., Foley, J., & Jensen, D. (2018). Let’s play again: Variability of deep reinforcement learning agents in Atari environments. In NeurIPS critiquing and correcting trends workshop. – reference: OliehoekFAAmatoCA concise introduction to decentralized POMDPs2016BerlinSpringer1355.68005 – reference: VodopivecTSamothrakisSSterBOn Monte Carlo tree search and reinforcement learningJournal of Artificial Intelligence Research201760881936374205406825260 – reference: Wei, E., Wicke, D., Freelan, D., & Luke, S. (2018). Multiagent soft Q-learning. arXiv:1804.09817 – reference: AgoginoAKTumerKAnalyzing and visualizing multiagent rewards in dynamic and stochastic domainsAutonomous Agents and Multi-Agent Systems2008172320338 – reference: Liu, H., Feng, Y., Mao, Y., Zhou, D., Peng, J., & Liu, Q. (2018). Action-depedent control variates for policy optimization via stein’s identity. In International conference on learning representations. – reference: Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., & Graepel, T. (2018). The mechanics of n-player differentiable games. In Proceedings of the 35th international conference on machine learning, proceedings of machine learning research (pp. 354–363). Stockholm, Sweden. – reference: StrehlALLittmanMLAn analysis of model-based interval estimation for Markov decision processesJournal of Computer and System Sciences20087481309133124602871157.68059 – reference: WhitesonSStonePEvolutionary function approximation for reinforcement learningJournal of Machine Learning Research20067May87791722743901222.68330 – reference: Azizzadenesheli, K., Yang, B., Liu, W., Brunskill, E., Lipton, Z., & Anandkumar, A. (2018). Surprising negative results for generative adversarial tree search. In Critiquing and correcting trends in machine learning workshop. – reference: TampuuAMatiisenTKodeljaDKuzovkinIKorjusKAruJAruJVicenteRMultiagent cooperation and competition with deep reinforcement learningPLoS ONE2017124e0172395 – reference: Fulda, N., & Ventura, D. (2007). Predicting and preventing coordination problems in cooperative Q-learning systems. In Proceedings of the twentieth international joint conference on artificial intelligence (pp. 780–785). Hyderabad, India. – reference: GmytrasiewiczPJDurfeeEHRational coordination in multi-agent environmentsAutonomous Agents and Multi-Agent Systems200034319350 – reference: SchmidhuberJDeep learning in neural networks: An overviewNeural Networks20156185117 – reference: Bowling, M. (2000). Convergence problems of general-sum multiagent reinforcement learning. In International conference on machine learning (pp. 89–94). – reference: Oliehoek, F. A., Witwicki, S. J., & Kaelbling, L. P. (2012). Influence-based abstraction for multiagent systems. In Twenty-sixth AAAI conference on artificial intelligence. – reference: NowéAVrancxPDe HauwereYMWieringMvan OtterloMGame theory and multi-agent reinforcement learningReinforcement learning2012BerlinSpringer4414701216.68229 – reference: Amodei, D., & Hernandez, D. (2018). AI and compute. https://blog.openai.com/ai-and-compute. – reference: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347. – reference: Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., & Moritz, P. (2015). Trust region policy optimization. In 31st international conference on machine learning. Lille, France. – reference: Tesauro, G. (2003). Extending Q-learning to general adaptive multi-agent systems. In Advances in neural information processing systems (pp. 871–878). Vancouver, Canada. – reference: Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 541–548). Morgan Kaufmann Publishers Inc. – reference: Pyeatt, L. D., Howe, A. E., et al. (2001). Decision tree function approximation in reinforcement learning. In Proceedings of the third international symposium on adaptive systems: Evolutionary computation and probabilistic graphical models (Vol. 2, pp. 70–77). Cuba. – reference: Suarez, J., Du, Y., Isola, P., & Mordatch, I. (2019). Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. CoRR arXiv:1903.00784. – reference: BeckerRZilbersteinSLesserVGoldmanCVSolving transition independent decentralized Markov decision processesJournal of Artificial Intelligence Research20042242345521294741080.68655 – reference: Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121. – reference: Precup, D., Sutton, R. S., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the seventeenth international conference on machine learning. – reference: Gao, C., Hernandez-Leal, P., Kartal, B., & Taylor, M. E. (2019). Skynet: A top deep RL agent in the inaugural pommerman team competition. In 4th multidisciplinary conference on reinforcement learning and decision making. – reference: Guestrin, C., Lagoudakis, M., & Parr, R. (2002). Coordinated reinforcement learning. In ICML (Vol. 2, pp. 227–234). – reference: Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In Proceedings of learning, inference and control of multi-agent systems at NIPS. – reference: Gu, S. S., Lillicrap, T., Turner, R. E., Ghahramani, Z., Schölkopf, B., & Levine, S. (2017). Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. In Advances in neural information processing systems (pp. 3846–3855). – reference: Meuleau, N., Peshkin, L., Kim, K. E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 427–436). – reference: Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937). – reference: Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., & Kavukcuoglu, K. (2017). FeUdal networks for hierarchical reinforcement learning. In International conference on machine learning. – reference: Lerer, A., & Peysakhovich, A. (2017). Maintaining cooperation in complex social dilemmas using deep reinforcement learning. CoRR arXiv:1707.01068. – reference: Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems (pp. 901–909). – reference: Song, X., Wang, T., & Zhang, C. (2019). Convergence of multi-agent learning with a finite step size in general-sum games. In 18th International conference on autonomous agents and multiagent systems. – reference: SpencerJRandomization, derandomization and antirandomization: three gamesTheoretical Computer Science1994131241542912889480805.90121 – reference: Gordon, G. J. (1999). Approximate solutions to Markov decision processes. Technical report, Carnegie-Mellon University. – reference: Lipton, Z. C., Azizzadenesheli, K., Kumar, A., Li, L., Gao, J., & Deng, L. (2018). Combating reinforcement learning’s Sisyphean curse with intrinsic fear. arXiv:1611.01211v8. – reference: Wolpert, D. H., & Tumer, K. (2002). Optimal payoff functions for members of collectives. In Modeling complexity in economic and social systems (pp. 355–369). – reference: HuJWellmanMPNash Q-learning for general-sum stochastic gamesThe Journal of Machine Learning Research200341039106921253451094.68076 – reference: Gu, S., Lillicrap, T., Ghahramani, Z., Turner, R. E., & Levine, S. (2017). Q-prop: Sample-efficient policy gradient with an off-policy critic. In International conference on learning representations. – reference: Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., & Clune, J. (2019). Go-explore: A new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995. – reference: Johnson, M., Hofmann, K., Hutton, T., & Bignell, D. (2016). The Malmo platform for artificial intelligence experimentation. In IJCAI (pp. 4246–4247). – reference: Lanctot, M., Zambaldi, V. F., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., Silver, D., & Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. In Advances in neural information processing systems. – reference: Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations. – reference: Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NIPS deep learning workshop. – reference: SilverDHuangAMaddisonCJGuezASifreLvan den DriesscheGSchrittwieserJAntonoglouIPanneershelvamVLanctotMDielemanSGreweDNhamJKalchbrennerNSutskeverILillicrapTLeachMKavukcuogluKGraepelTHassabisDMastering the game of go with deep neural networks and tree searchNature20165297587484489 – reference: EloAEThe rating of chessplayers, past and present1978NagoyaArco Pub. – reference: Wang, H., Raj, B., & Xing, E. P. (2017). On the origin of deep learning. CoRR arXiv:1702.07800. – reference: Panait, L., Sullivan, K., & Luke, S. (2006). Lenience towards teammates helps in cooperative multiagent learning. In Proceedings of the 5th international conference on autonomous agents and multiagent systems. Hakodate, Japan. – reference: HochreiterSSchmidhuberJLong short-term memoryNeural Computation19979817351780 – reference: BowlingMVelosoMMultiagent learning using a variable learning rateArtificial Intelligence2002136221525018958190995.68075 – reference: Lin, L. J. (1991). Programming robots using reinforcement learning and teaching. In AAAI (pp. 781–786). – reference: KalaiELehrerERational learning leads to Nash equilibriumEconometrica: Journal of the Econometric Society1993611019104512347920793.90106 – reference: Leibo, J. Z., Perolat, J., Hughes, E., Wheelwright, S., Marblestone, A. H., Duéñez-Guzmán, E., Sunehag, P., Dunning, I., & Graepel, T. (2019). Malthusian reinforcement learning. In 18th international conference on autonomous agents and multiagent systems. – reference: Costa GomesMCrawfordVPBrosetaBCognition and behavior in normal-form games: An experimental studyEconometrica200169511931235 – reference: OpenAI Baselines: ACKTR & A2C. (2017). [Online]. Retrieved April 29, 2019, https://openai.com/blog/baselines-acktr-a2c/ . – reference: Barrett, S., Stone, P., Kraus, S., & Rosenfeld, A. (2013). Teamwork with Limited Knowledge of Teammates. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 102–108. Bellevue, WS, USA. – reference: Liu, S., Lever, G., Merel, J., Tunyasuvunakool, S., Heess, N., & Graepel, T. (2019). Emergent coordination through competition. In International conference on learning representations. – reference: Lipton, Z. C., & Steinhardt, J. (2018). Troubling trends in machine learning scholarship. In ICML Machine Learning Debates workshop. – reference: SzepesváriCLittmanMLA unified analysis of value-function-based reinforcement-learning algorithmsNeural Computation199911820172060 – reference: Tamar, A., Levine, S., Abbeel, P., Wu, Y., & Thomas, G. (2016). Value iteration networks. In NIPS (pp. 2154–2162). – reference: LaurentGJMatignonLFort-PiatLThe world of independent learners is not MarkovianInternational Journal of Knowledge-based and Intelligent Engineering Systems20111515564 – reference: Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In International conference on learning representations. – reference: OliehoekFASpaanMTVlassisNOptimal and approximate Q-value functions for decentralized POMDPsJournal of Artificial Intelligence Research20083228935324207381182.68261 – reference: Ilyas, A., Engstrom, L., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., & Madry, A. (2018). Are deep policy gradient algorithms truly policy gradient algorithms? CoRR arXiv:1811.02553. – reference: SzepesváriCAlgorithms for reinforcement learningSynthesis Lectures on Artificial Intelligence and Machine Learning20104111031205.68320 – reference: De JongKAEvolutionary computation: A unified approach2006CambridgeMIT press1106.68093 – reference: Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 489–496). ACM. – reference: Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S. M. A., Riedmiller, M. A., & Silver, D. (2017). Emergence of locomotion behaviours in rich environments. arXiv:1707.02286v2 – reference: Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., & Dunning, I., et al. (2018). IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In International conference on machine learning. – reference: Hasselt, H. V. (2010). Double Q-learning. In Advances in neural information processing systems (pp. 2613–2621). – reference: MondererDShapleyLSFictitious play property for games with identical interestsJournal of Economic Theory199668125826513724000849.90130 – reference: Pesce, E., & Montana, G. (2019). Improving coordination in multi-agent deep reinforcement learning through memory-driven communication. CoRR arXiv:1901.03887. – reference: PowersRShohamYVuTA general criterion and an algorithmic framework for learning in multi-agent systemsMachine Learning2007671–24576 – reference: Damer, S., & Gini, M. (2017). Safely using predictions in general-sum normal form games. In Proceedings of the 16th conference on autonomous agents and multiagent systems. Sao Paulo. – reference: HauskrechtMValue-function approximations for partially observable Markov decision processesJournal of Artificial Intelligence Research2000131339417818620946.68131 – reference: MooreAWAtkesonCGPrioritized sweeping: Reinforcement learning with less data and less timeMachine Learning1993131103130 – reference: ErdösPSelfridgeJLOn a combinatorial gameJournal of Combinatorial Theory, Series A19731432983013273130293.05004 – reference: SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature20175507676354 – reference: Albrecht, S. V., & Ramamoorthy, S. (2013). A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In Proceedings of the 12th international conference on autonomous agents and multi-agent systems. Saint Paul, MN, USA. – reference: Multiagent Learning, Foundations and Recent Trends. (2017). [Online]. Retrieved September 7, 2018, https://www.cs.utexas.edu/~larg/ijcai17_tutorial/multiagent_learning.pdf . – reference: Resnick, C., Eldridge, W., Ha, D., Britz, D., Foerster, J., Togelius, J., Cho, K., & Bruna, J. (2018). Pommerman: A multi-agent playground. arXiv:1809.07124. – reference: Watkins, J. (1989). Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge, UK – reference: SilvaFLCostaAHRA survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research201964645703393255907037594 – reference: WilliamsRJSimple statistical gradient-following algorithms for connectionist reinforcement learningMachine Learning199283–42292560772.68076 – reference: Raghu, M., Irpan, A., Andreas, J., Kleinberg, R., Le, Q., & Kleinberg, J. (2018). Can deep reinforcement learning solve Erdos–Selfridge-spencer games? In Proceedings of the 35th international conference on machine learning. – reference: ConitzerVSandholmTAWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponentsMachine Learning2006671–22343 – reference: Bowling, M., & McCracken, P. (2005). Coordination and adaptation in impromptu teams. Proceedings of the nineteenth conference on artificial intelligence (Vol. 5, pp. 53–58). – reference: Ling, C. K., Fang, F., & Kolter, J. Z. (2018). What game are we playing? End-to-end learning in normal and extensive form games. In Twenty-seventh international joint conference on artificial intelligence. – reference: Devlin, S., Yliniemi, L. M., Kudenko, D., & Tumer, K. (2014). Potential-based difference rewards for multiagent reinforcement learning. In 13th International conference on autonomous agents and multiagent systems, AAMAS 2014. Paris, France. – reference: Firoiu, V., Whitney, W. F., & Tenenbaum, J. B. (2017). Beating the World’s best at super smash Bros. with deep reinforcement learning. CoRR arXiv:1702.06230. – reference: Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (pp. 703–710) – reference: Zheng, Y., Hao, J., & Zhang, Z. (2018). Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. arXiv:1802.08534. – reference: Cuccu, G., Togelius, J., & Cudré-Mauroux, P. (2019). Playing Atari with six neurons. In Proceedings of the 18th international conference on autonomous agents and multiagent systems (pp. 998–1006). International Foundation for Autonomous Agents and Multiagent Systems. – reference: Grover, A., Al-Shedivat, M., Gupta, J. K., Burda, Y., & Edwards, H. (2018). Learning policy representations in multiagent systems. In International conference on machine learning. – reference: Banerjee, B., & Peng, J. (2003). Adaptive policy gradient in multiagent learning. In Proceedings of the second international joint conference on Autonomous agents and multiagent systems (pp. 686–692). ACM. – reference: Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2017). Reinforcement learning with unsupervised auxiliary tasks. In International conference on learning representations. – reference: Kaisers, M., & Tuyls, K. (2011). FAQ-learning in matrix games: demonstrating convergence near Nash equilibria, and bifurcation of attractors in the battle of sexes. In AAAI Workshop on Interactive Decision Theory and Game Theory (pp. 309–316). San Francisco, CA, USA. – reference: TesauroGTemporal difference learning and TD-GammonCommunications of the ACM19953835868 – reference: Yu, Y. (2018). Towards sample efficient reinforcement learning. In IJCAI (pp. 5739–5743). – reference: Schmidhuber, J. (1991). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the international conference on simulation of adaptive behavior: From animals to animats (pp. 222–227). – reference: Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In Advances in neural information processing systems (pp. 271–278). – reference: Kamihigashi, T., & Le Van, C. (2015). Necessary and sufficient conditions for a solution of the bellman equation to be the value function: A general principle. https://halshs.archives-ouvertes.fr/halshs-01159177 – reference: ChakrabortyDStonePMultiagent learning in the presence of memory-bounded agentsAutonomous Agents and Multi-Agent Systems2013282182213 – reference: Bacchiani, G., Molinari, D., & Patander, M. (2019). Microscopic traffic simulation by cooperative multi-agent deep reinforcement learning. In AAMAS. – reference: Bull, L. (1998). Evolutionary computing in multi-agent environments: Operators. In International conference on evolutionary programming (pp. 43–52). Springer. – reference: Omidshafiei, S., Pazis, J., Amato, C., How, J. P., & Vian, J. (2017). Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th international conference on machine learning. Sydney. – reference: MahadevanSConnellJAutomatic programming of behavior-based robots using reinforcement learningArtificial Intelligence1992552–3311365 – reference: Rabinowitz, N. C., Perbet, F., Song, H. F., Zhang, C., Eslami, S. M. A., & Botvinick, M. (2018). Machine theory of mind. In International conference on machine learning. Stockholm, Sweden. – reference: Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th national conference on artificial intelligence (pp. 746–752). Madison, Wisconsin, USA. – reference: NashJFEquilibrium points in n-person gamesProceedings of the National Academy of Sciences19503614849317010036.01104 – reference: Bellemare, M. G., Dabney, W., Dadashi, R., Taïga, A. A., Castro, P. S., & Roux, N. L., et al. (2019). A geometric perspective on optimal representations for reinforcement learning. CoRR arXiv:1901.11530. – reference: MoriartyDESchultzACGrefenstetteJJEvolutionary algorithms for reinforcement learningJournal of Artificial Intelligence Research1999112412760924.68157 – reference: PutermanMLMarkov decision processes: Discrete stochastic dynamic programming1994New YorkWiley0829.90134 – reference: Shelhamer, E., Mahmoudieh, P., Argus, M., & Darrell, T. (2017). Loss is its own reward: Self-supervision for reinforcement learning. In ICLR workshops. – reference: Castro, P. S., Moitra, S., Gelada, C., Kumar, S., Bellemare, M. G. (2018). Dopamine: A research framework for deep reinforcement learning. arXiv:1812.06110. – reference: MatignonLLaurentGJLe Fort-PiatNIndependent reinforcement learners in cooperative Markov games: A survey regarding coordination problemsKnowledge Engineering Review2012271131 – reference: Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the sixteenth international conference on machine learning (pp. 278–287). – reference: Schmidhuber, J. (2015). Critique of Paper by “Deep Learning Conspiracy” (Nature 521 p 436). http://people.idsia.ch/~juergen/deep-learning-conspiracy.html. – reference: Even-DarEMansourYLearning rates for Q-learningJournal of Machine Learning Research20035Dec12522479721222.68196 – reference: BartoAGMirolliMBaldassarreGIntrinsic motivation and reinforcement learningIntrinsically motivated learning in natural and artificial systems2013BerlinSpringer1747 – reference: Oliehoek, F. A. (2018). Interactive learning and decision making - foundations, insights & challenges. In International joint conference on artificial intelligence. – reference: Pérolat, J., Piot, B., & Pietquin, O. (2018). Actor-critic fictitious play in simultaneous move multistage games. In 21st international conference on artificial intelligence and statistics. – reference: Gullapalli, V., & Barto, A. G. (1992). Shaping as a method for accelerating reinforcement learning. In Proceedings of the 1992 IEEE international symposium on intelligent control (pp. 554–559). IEEE. – reference: Li, Y. (2017). Deep reinforcement learning: An overview. CoRR arXiv:1701.07274. – volume: 28 start-page: 2222 issue: 10 year: 2017 ident: 9421_CR114 publication-title: IEEE Transactions on Neural Networks and Learning Systems doi: 10.1109/TNNLS.2016.2582924 – volume-title: Theory of games and economic behavior year: 1945 ident: 9421_CR341 – volume: 55 start-page: 311 issue: 2–3 year: 1992 ident: 9421_CR212 publication-title: Artificial Intelligence doi: 10.1016/0004-3702(92)90058-6 – ident: 9421_CR302 doi: 10.1609/aaai.v32i1.11606 – volume: 7 start-page: 877 issue: May year: 2006 ident: 9421_CR351 publication-title: Journal of Machine Learning Research – ident: 9421_CR219 – ident: 9421_CR25 – ident: 9421_CR48 – ident: 9421_CR168 – ident: 9421_CR192 – ident: 9421_CR217 doi: 10.1145/1390156.1390240 – ident: 9421_CR202 – ident: 9421_CR334 – ident: 9421_CR139 – ident: 9421_CR337 doi: 10.1109/ADPRL.2009.4927542 – ident: 9421_CR357 – volume: 50 start-page: 312 issue: 3 year: 2005 ident: 9421_CR287 publication-title: IEEE Transactions on Automatic Control doi: 10.1109/TAC.2005.843878 – ident: 9421_CR19 – ident: 9421_CR116 – ident: 9421_CR157 – volume: 86 start-page: 638 issue: 3 year: 1979 ident: 9421_CR270 publication-title: Psychological Bulletin doi: 10.1037/0033-2909.86.3.638 – volume: 53 start-page: 659 year: 2015 ident: 9421_CR38 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.4818 – ident: 9421_CR111 – ident: 9421_CR220 – volume: 9 start-page: 9937 year: 2019 ident: 9421_CR243 publication-title: Scientific Reports doi: 10.1038/s41598-019-45619-9 – ident: 9421_CR88 – volume: 3 start-page: 213 issue: Oct year: 2002 ident: 9421_CR47 publication-title: Journal of Machine Learning Research – ident: 9421_CR163 – volume: 74 start-page: 1309 issue: 8 year: 2008 ident: 9421_CR307 publication-title: Journal of Computer and System Sciences doi: 10.1016/j.jcss.2007.08.009 – volume: 2 start-page: 55 issue: 1 year: 2001 ident: 9421_CR200 publication-title: Cognitive Systems Research doi: 10.1016/S1389-0417(01)00015-8 – volume-title: Markov decision processes: Discrete stochastic dynamic programming year: 1994 ident: 9421_CR261 doi: 10.1002/9780470316887 – ident: 9421_CR105 – ident: 9421_CR180 – volume: 15 start-page: 1929 issue: 1 year: 2014 ident: 9421_CR301 publication-title: The Journal of Machine Learning Research – ident: 9421_CR327 doi: 10.1109/IROS.2012.6386109 – ident: 9421_CR207 – ident: 9421_CR231 – volume: 15 start-page: 55 issue: 1 year: 2011 ident: 9421_CR182 publication-title: International Journal of Knowledge-based and Intelligent Engineering Systems doi: 10.3233/KES-2010-0206 – ident: 9421_CR316 – volume: 6 start-page: 503 issue: Apr year: 2005 ident: 9421_CR92 publication-title: Journal of Machine Learning Research – volume-title: Game theory year: 1991 ident: 9421_CR103 – start-page: 183 volume-title: Innovations in multi-agent systems and applications - 1 year: 2010 ident: 9421_CR56 doi: 10.1007/978-3-642-14435-6_7 – volume: 19 start-page: 399 year: 2003 ident: 9421_CR120 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.1000 – start-page: 109 volume-title: Psychology of learning and motivation year: 1989 ident: 9421_CR214 – volume: 19 start-page: 347 issue: 1 year: 2018 ident: 9421_CR83 publication-title: The Journal of Machine Learning Research – volume: 67 start-page: 45 issue: 1–2 year: 2007 ident: 9421_CR259 publication-title: Machine Learning doi: 10.1007/s10994-006-9643-2 – volume: 6 start-page: 679 issue: 5 year: 1957 ident: 9421_CR33 publication-title: Journal of Mathematics and Mechanics – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 9421_CR221 publication-title: Nature doi: 10.1038/nature14236 – volume: 84 start-page: 137 issue: 1–2 year: 2011 ident: 9421_CR128 publication-title: Machine Learning doi: 10.1007/s10994-011-5235-x – volume: 4 start-page: 1 issue: 1 year: 2010 ident: 9421_CR318 publication-title: Synthesis Lectures on Artificial Intelligence and Machine Learning doi: 10.2200/S00268ED1V01Y201005AIM009 – ident: 9421_CR31 – ident: 9421_CR266 – ident: 9421_CR283 – ident: 9421_CR37 – ident: 9421_CR242 – ident: 9421_CR185 – volume: 521 start-page: 436 issue: 7553 year: 2015 ident: 9421_CR184 publication-title: Nature doi: 10.1038/nature14539 – volume: 1995 start-page: 30 year: 1995 ident: 9421_CR21 publication-title: Machine Learning Proceedings – ident: 9421_CR368 – ident: 9421_CR162 – ident: 9421_CR255 – ident: 9421_CR358 doi: 10.24963/ijcai.2019/88 – ident: 9421_CR195 doi: 10.24963/ijcai.2018/55 – ident: 9421_CR100 – ident: 9421_CR99 – volume-title: Evolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms year: 1996 ident: 9421_CR20 doi: 10.1093/oso/9780195099713.001.0001 – ident: 9421_CR154 doi: 10.1162/neco.1994.6.6.1185 – ident: 9421_CR284 – volume: 33 start-page: 41 issue: 3 year: 2012 ident: 9421_CR333 publication-title: AI Magazine doi: 10.1609/aimag.v33i3.2426 – volume: 5 start-page: 1 issue: Dec year: 2003 ident: 9421_CR94 publication-title: Journal of Machine Learning Research – volume: 119 start-page: 861 issue: 3 year: 2004 ident: 9421_CR59 publication-title: The Quarterly Journal of Economics doi: 10.1162/0033553041502225 – ident: 9421_CR82 – ident: 9421_CR346 – ident: 9421_CR140 – ident: 9421_CR174 doi: 10.1145/1015330.1015410 – ident: 9421_CR295 – ident: 9421_CR65 – volume: 16 start-page: 185 issue: 3 year: 1994 ident: 9421_CR329 publication-title: Machine Learning – volume: 13 start-page: 33 issue: 1 year: 2000 ident: 9421_CR132 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.678 – volume: 359 start-page: 418 issue: 6374 year: 2018 ident: 9421_CR50 publication-title: Science doi: 10.1126/science.aao1733 – ident: 9421_CR278 – volume: 69 start-page: 1193 issue: 5 year: 2001 ident: 9421_CR72 publication-title: Econometrica doi: 10.1111/1468-0262.00239 – ident: 9421_CR193 – ident: 9421_CR310 – ident: 9421_CR335 – ident: 9421_CR54 – ident: 9421_CR247 – volume: 8 start-page: 323 issue: 3–4 year: 1992 ident: 9421_CR296 publication-title: Machine Learning – ident: 9421_CR208 doi: 10.1609/aaai.v33i01.33014504 – start-page: 1 volume-title: Multiple Classifier Systems year: 2000 ident: 9421_CR87 – ident: 9421_CR117 – ident: 9421_CR134 – ident: 9421_CR151 – ident: 9421_CR12 – ident: 9421_CR215 – ident: 9421_CR145 doi: 10.1609/aaai.v32i1.11796 – ident: 9421_CR58 – ident: 9421_CR29 – ident: 9421_CR330 – ident: 9421_CR26 doi: 10.1609/aaai.v27i1.8659 – ident: 9421_CR64 – ident: 9421_CR41 – ident: 9421_CR121 – ident: 9421_CR253 – volume: 131 start-page: 415 issue: 2 year: 1994 ident: 9421_CR299 publication-title: Theoretical Computer Science doi: 10.1016/0304-3975(94)90181-3 – ident: 9421_CR93 – volume: 37 start-page: 147 issue: 1–2 year: 1996 ident: 9421_CR277 publication-title: Biosystems doi: 10.1016/0303-2647(95)01551-5 – ident: 9421_CR309 – ident: 9421_CR70 – volume: 4 start-page: 1039 year: 2003 ident: 9421_CR149 publication-title: The Journal of Machine Learning Research – ident: 9421_CR244 – volume: 67 start-page: 23 issue: 1–2 year: 2006 ident: 9421_CR71 publication-title: Machine Learning – volume: 82 start-page: 281 issue: 3 year: 2011 ident: 9421_CR73 publication-title: Machine Learning doi: 10.1007/s10994-010-5192-9 – volume: 54 start-page: 489 issue: 3 year: 2006 ident: 9421_CR4 publication-title: Operations Research doi: 10.1287/opre.1060.0291 – ident: 9421_CR282 – ident: 9421_CR178 – volume: 550 start-page: 354 issue: 7676 year: 2017 ident: 9421_CR293 publication-title: Nature doi: 10.1038/nature24270 – ident: 9421_CR328 doi: 10.1109/CIG.2018.8490422 – ident: 9421_CR365 – volume-title: Multiagent systems. Intelligent robotics and autonomous agents series year: 2013 ident: 9421_CR350 – ident: 9421_CR276 – ident: 9421_CR126 – volume: 31 start-page: 767 issue: 4 year: 2017 ident: 9421_CR144 publication-title: Autonomous Agents and Multi-Agent Systems doi: 10.1007/s10458-016-9352-6 – ident: 9421_CR24 – volume: 9 start-page: 423 issue: Mar year: 2008 ident: 9421_CR250 publication-title: JMLR – volume: 38 start-page: 58 issue: 3 year: 1995 ident: 9421_CR325 publication-title: Communications of the ACM doi: 10.1145/203330.203343 – ident: 9421_CR98 – ident: 9421_CR203 – volume: 64 start-page: 645 year: 2019 ident: 9421_CR290 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.1.11396 – ident: 9421_CR199 – ident: 9421_CR18 – ident: 9421_CR249 doi: 10.1145/1160633.1160776 – ident: 9421_CR336 doi: 10.1609/aaai.v30i1.10295 – ident: 9421_CR137 doi: 10.1609/aaai.v32i1.11694 – volume: 3 start-page: 79 issue: 1 year: 1991 ident: 9421_CR155 publication-title: Neural Computation doi: 10.1162/neco.1991.3.1.79 – ident: 9421_CR133 – ident: 9421_CR265 – ident: 9421_CR288 – volume-title: A concise introduction to decentralized POMDPs year: 2016 ident: 9421_CR237 doi: 10.1007/978-3-319-28929-8 – ident: 9421_CR303 – volume: 38 start-page: 316 issue: 2–3 year: 2019 ident: 9421_CR35 publication-title: The International Journal of Robotics Research doi: 10.1177/0278364918755924 – volume-title: Reinforcement learning: An introduction year: 2018 ident: 9421_CR315 – ident: 9421_CR127 – ident: 9421_CR104 – ident: 9421_CR166 – ident: 9421_CR204 – ident: 9421_CR69 – ident: 9421_CR232 – volume: 61 start-page: 85 year: 2015 ident: 9421_CR281 publication-title: Neural Networks doi: 10.1016/j.neunet.2014.09.003 – ident: 9421_CR52 doi: 10.1145/1150402.1150464 – ident: 9421_CR189 – ident: 9421_CR77 doi: 10.1145/1160633.1160770 – ident: 9421_CR30 – ident: 9421_CR122 doi: 10.1109/ISIC.1992.225046 – ident: 9421_CR13 – ident: 9421_CR177 doi: 10.1109/ICNN.1997.616132 – ident: 9421_CR86 – volume: 17 start-page: 335 issue: 2 year: 2005 ident: 9421_CR227 publication-title: Neural Computation doi: 10.1162/0899766053011528 – ident: 9421_CR331 – ident: 9421_CR161 – ident: 9421_CR361 doi: 10.24963/ijcai.2018/820 – volume: 8 start-page: 345 issue: 3 year: 2000 ident: 9421_CR305 publication-title: Autonomous Robots doi: 10.1023/A:1008942012299 – volume: 60 start-page: 881 year: 2017 ident: 9421_CR340 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.5507 – ident: 9421_CR254 – ident: 9421_CR9 – volume: 356 start-page: 508 issue: 6337 year: 2017 ident: 9421_CR224 publication-title: Science doi: 10.1126/science.aam6960 – ident: 9421_CR75 – ident: 9421_CR314 – ident: 9421_CR8 doi: 10.1609/aaai.v29i1.9439 – volume: 28 start-page: 182 issue: 2 year: 2013 ident: 9421_CR66 publication-title: Autonomous Agents and Multi-Agent Systems doi: 10.1007/s10458-013-9222-4 – ident: 9421_CR308 – ident: 9421_CR342 – volume: 33 start-page: 235 issue: 2–3 year: 1998 ident: 9421_CR74 publication-title: Machine Learning doi: 10.1023/A:1007518724497 – ident: 9421_CR260 – ident: 9421_CR138 – ident: 9421_CR209 – ident: 9421_CR172 – ident: 9421_CR169 doi: 10.1109/ICRA.2015.7139357 – ident: 9421_CR356 doi: 10.1145/301136.301167 – ident: 9421_CR240 – ident: 9421_CR183 – start-page: 209 volume-title: Advances in neural information processing systems year: 2004 ident: 9421_CR42 – ident: 9421_CR343 – ident: 9421_CR234 – ident: 9421_CR366 – ident: 9421_CR148 – volume: 364 start-page: 859 issue: 6443 year: 2019 ident: 9421_CR156 publication-title: Science doi: 10.1126/science.aau6249 – ident: 9421_CR22 – ident: 9421_CR124 doi: 10.1007/978-3-319-71682-4_5 – ident: 9421_CR125 – ident: 9421_CR160 – ident: 9421_CR257 – ident: 9421_CR292 – ident: 9421_CR97 – volume: 27 start-page: 819 issue: 4 year: 2002 ident: 9421_CR34 publication-title: Mathematics of Operations Research doi: 10.1287/moor.27.4.819.297 – ident: 9421_CR320 – volume: 8 start-page: 229 issue: 3–4 year: 1992 ident: 9421_CR354 publication-title: Machine Learning – ident: 9421_CR131 – volume: 11 start-page: 241 year: 1999 ident: 9421_CR226 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.613 – ident: 9421_CR263 – ident: 9421_CR286 – start-page: 66 volume-title: Autonomous agents and multiagent systems year: 2017 ident: 9421_CR123 doi: 10.1007/978-3-319-71682-4_5 – ident: 9421_CR359 doi: 10.24963/ijcai.2018/79 – ident: 9421_CR348 – ident: 9421_CR11 – ident: 9421_CR63 – volume: 9 start-page: 2579 issue: Nov year: 2008 ident: 9421_CR210 publication-title: Journal of Machine Learning Research – volume: 61 start-page: 1019 year: 1993 ident: 9421_CR167 publication-title: Econometrica: Journal of the Econometric Society doi: 10.2307/2951492 – ident: 9421_CR188 – ident: 9421_CR364 doi: 10.1145/3219819.3219918 – ident: 9421_CR53 doi: 10.1007/BFb0040758 – ident: 9421_CR297 – ident: 9421_CR80 – ident: 9421_CR225 doi: 10.1609/aaai.v32i1.11492 – volume-title: Evolutionary computation: A unified approach year: 2006 ident: 9421_CR85 – volume-title: Pattern recognition and machine learning year: 2006 ident: 9421_CR36 – ident: 9421_CR228 – volume: 36 start-page: 48 issue: 1 year: 1950 ident: 9421_CR229 publication-title: Proceedings of the National Academy of Sciences doi: 10.1073/pnas.36.1.48 – ident: 9421_CR355 doi: 10.1142/9789812777263_0020 – ident: 9421_CR252 – ident: 9421_CR332 doi: 10.1145/1329125.1329434 – volume: 4 start-page: 1 issue: 1 year: 2012 ident: 9421_CR51 publication-title: IEEE Transactions on Computational Intelligence and AI in Games doi: 10.1109/TCIAIG.2012.2186810 – volume: 14 start-page: 298 issue: 3 year: 1973 ident: 9421_CR91 publication-title: Journal of Combinatorial Theory, Series A doi: 10.1016/0097-3165(73)90005-8 – volume: 24 start-page: 49 issue: 1 year: 2005 ident: 9421_CR109 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.1579 – ident: 9421_CR136 – ident: 9421_CR10 – ident: 9421_CR245 – ident: 9421_CR119 – ident: 9421_CR23 doi: 10.1145/860575.860686 – ident: 9421_CR108 – ident: 9421_CR349 – ident: 9421_CR79 – volume: 11 start-page: 2017 issue: 8 year: 1999 ident: 9421_CR319 publication-title: Neural Computation doi: 10.1162/089976699300016070 – ident: 9421_CR326 – volume: 17 start-page: 213 issue: 2 year: 2013 ident: 9421_CR275 publication-title: IEEE Transactions on Evolutionary Computation doi: 10.1109/TEVC.2012.2208755 – ident: 9421_CR298 – volume: 45 start-page: 2673 issue: 11 year: 1997 ident: 9421_CR285 publication-title: IEEE Transactions on Signal Processing doi: 10.1109/78.650093 – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 9421_CR291 publication-title: Nature doi: 10.1038/nature16961 – ident: 9421_CR143 – ident: 9421_CR171 – ident: 9421_CR313 – ident: 9421_CR46 – ident: 9421_CR360 – ident: 9421_CR17 – volume: 47 start-page: 253 year: 2013 ident: 9421_CR32 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.3912 – volume: 28 start-page: 41 issue: 1 year: 1997 ident: 9421_CR62 publication-title: Machine Learning doi: 10.1023/A:1007379606734 – ident: 9421_CR269 – ident: 9421_CR216 – volume: 10 start-page: 174 issue: 1 year: 1965 ident: 9421_CR14 publication-title: Journal of Mathematical Analysis and Applications doi: 10.1016/0022-247X(65)90154-X – volume: 11 start-page: 219 issue: 3–4 year: 2018 ident: 9421_CR101 publication-title: Foundations and Trends® in Machine Learning doi: 10.1561/2200000071 – ident: 9421_CR159 – ident: 9421_CR57 – ident: 9421_CR311 doi: 10.1007/3-540-52255-7_33 – volume: 32 start-page: 289 year: 2008 ident: 9421_CR239 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.2447 – ident: 9421_CR165 – ident: 9421_CR40 – ident: 9421_CR60 doi: 10.1057/9780230523371_8 – ident: 9421_CR150 doi: 10.1007/3-540-61723-X_967 – ident: 9421_CR258 – ident: 9421_CR205 – volume: 211 start-page: 1390 issue: 27 year: 1981 ident: 9421_CR15 publication-title: Science doi: 10.1126/science.7466396 – ident: 9421_CR5 – volume: 5 start-page: 1 issue: 1 year: 1997 ident: 9421_CR271 publication-title: Evolutionary Computation doi: 10.1162/evco.1997.5.1.1 – ident: 9421_CR96 – start-page: 441 volume-title: Reinforcement learning year: 2012 ident: 9421_CR233 doi: 10.1007/978-3-642-27645-3_14 – ident: 9421_CR264 – ident: 9421_CR238 doi: 10.1145/1143997.1144059 – start-page: 330 volume-title: Machine Learning Proceedings 1993 year: 1993 ident: 9421_CR323 – ident: 9421_CR158 – volume: 258 start-page: 66 year: 2018 ident: 9421_CR6 publication-title: Artificial Intelligence doi: 10.1016/j.artint.2018.01.002 – volume: 4 start-page: 237 year: 1996 ident: 9421_CR164 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.301 – ident: 9421_CR187 – volume: 17 start-page: 320 issue: 2 year: 2008 ident: 9421_CR3 publication-title: Autonomous Agents and Multi-Agent Systems doi: 10.1007/s10458-008-9046-9 – ident: 9421_CR106 – ident: 9421_CR141 – ident: 9421_CR273 – ident: 9421_CR236 doi: 10.24963/ijcai.2018/813 – volume: 22 start-page: 423 year: 2004 ident: 9421_CR28 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.1497 – volume: 61 start-page: 56 issue: 10 year: 2018 ident: 9421_CR81 publication-title: Communications of the ACM doi: 10.1145/3271625 – ident: 9421_CR230 – ident: 9421_CR362 – ident: 9421_CR196 – ident: 9421_CR206 – volume: 7 start-page: 83 year: 1997 ident: 9421_CR321 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.433 – ident: 9421_CR338 – volume: 11 start-page: 387 issue: 3 year: 2005 ident: 9421_CR248 publication-title: Autonomous Agents and Multi-Agent Systems doi: 10.1007/s10458-005-2631-2 – ident: 9421_CR2 – ident: 9421_CR78 – volume: 13 start-page: 374 issue: 1 year: 1951 ident: 9421_CR49 publication-title: Activity Analysis of Production and Allocation – ident: 9421_CR112 – ident: 9421_CR304 doi: 10.1609/aaai.v24i1.7529 – volume: 199–200 start-page: 67 issue: C year: 2013 ident: 9421_CR76 publication-title: Artificial Intelligence doi: 10.1016/j.artint.2013.05.004 – ident: 9421_CR135 – ident: 9421_CR153 doi: 10.1609/aaai.v32i1.11595 – ident: 9421_CR267 – ident: 9421_CR241 – ident: 9421_CR84 – volume: 1 start-page: 120 year: 1996 ident: 9421_CR61 publication-title: AAAI/IAAI – volume: 14 start-page: 159 issue: 3 year: 1967 ident: 9421_CR129 publication-title: Management Science doi: 10.1287/mnsc.14.3.159 – ident: 9421_CR218 – volume-title: The rating of chessplayers, past and present year: 1978 ident: 9421_CR90 – volume: 13 start-page: 103 issue: 1 year: 1993 ident: 9421_CR223 publication-title: Machine Learning – ident: 9421_CR344 – ident: 9421_CR235 – ident: 9421_CR67 – volume: 104 start-page: 99 issue: 1 year: 2016 ident: 9421_CR272 publication-title: Machine Learning doi: 10.1007/s10994-016-5547-y – ident: 9421_CR107 doi: 10.1609/aiide.v15i1.5220 – volume: 3 start-page: 319 issue: 4 year: 2000 ident: 9421_CR110 publication-title: Autonomous Agents and Multi-Agent Systems doi: 10.1023/A:1010028119149 – ident: 9421_CR256 – ident: 9421_CR68 doi: 10.1609/aaai.v31i1.10810 – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 9421_CR147 publication-title: Neural Computation doi: 10.1162/neco.1997.9.8.1735 – ident: 9421_CR170 – ident: 9421_CR312 – ident: 9421_CR146 – volume: 136 start-page: 215 issue: 2 year: 2002 ident: 9421_CR45 publication-title: Artificial Intelligence doi: 10.1016/S0004-3702(02)00121-2 – volume: 68 start-page: 258 issue: 1 year: 1996 ident: 9421_CR222 publication-title: Journal of Economic Theory doi: 10.1006/jeth.1996.0014 – ident: 9421_CR306 – ident: 9421_CR191 – volume: 12 start-page: e0172395 issue: 4 year: 2017 ident: 9421_CR322 publication-title: PLoS ONE doi: 10.1371/journal.pone.0172395 – volume-title: Algorithmic game theory year: 2007 ident: 9421_CR39 – ident: 9421_CR262 – ident: 9421_CR142 doi: 10.1609/aiide.v15i1.5221 – volume: 38 start-page: 287 issue: 3 year: 2000 ident: 9421_CR294 publication-title: Machine Learning doi: 10.1023/A:1007678930559 – ident: 9421_CR181 – ident: 9421_CR363 doi: 10.1609/aaai.v24i1.7639 – ident: 9421_CR345 – ident: 9421_CR353 doi: 10.1007/978-3-642-27645-3 – ident: 9421_CR89 – start-page: 17 volume-title: Intrinsically motivated learning in natural and artificial systems year: 2013 ident: 9421_CR27 doi: 10.1007/978-3-642-32375-1_2 – ident: 9421_CR352 doi: 10.1109/ADPRL.2011.5967363 – ident: 9421_CR251 – ident: 9421_CR190 doi: 10.1609/aaai.v33i01.33014213 – volume: 171 start-page: 365 issue: 7 year: 2007 ident: 9421_CR289 publication-title: Artificial Intelligence doi: 10.1016/j.artint.2006.02.006 – ident: 9421_CR176 doi: 10.1145/1143844.1143906 – volume: 27 start-page: 1 issue: 1 year: 2012 ident: 9421_CR213 publication-title: Knowledge Engineering Review doi: 10.1017/S0269888912000057 – ident: 9421_CR95 – ident: 9421_CR317 – volume: 16 start-page: 1 issue: 03 year: 2002 ident: 9421_CR7 publication-title: Knowledge Engineering Review – ident: 9421_CR152 – ident: 9421_CR246 – ident: 9421_CR175 – volume: 17 start-page: 1 year: 2016 ident: 9421_CR347 publication-title: Journal of Machine Learning Research – ident: 9421_CR280 – ident: 9421_CR118 – ident: 9421_CR198 doi: 10.1016/B978-1-55860-335-6.50027-1 – ident: 9421_CR300 – volume: 8 start-page: 293 issue: 3–4 year: 1992 ident: 9421_CR194 publication-title: Machine Learning – ident: 9421_CR186 – ident: 9421_CR1 – volume: 10 start-page: 1633 year: 2009 ident: 9421_CR324 publication-title: The Journal of Machine Learning Research – ident: 9421_CR367 – volume: 61 start-page: 523 year: 2018 ident: 9421_CR211 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.5699 – volume: 40 start-page: 1 year: 2016 ident: 9421_CR179 publication-title: Behavioral and Brain Sciences – ident: 9421_CR268 doi: 10.1007/11564096_32 – ident: 9421_CR173 doi: 10.1609/aaai.v33i01.33016079 – ident: 9421_CR44 – ident: 9421_CR279 – volume: 347 start-page: 145 issue: 6218 year: 2015 ident: 9421_CR43 publication-title: Science doi: 10.1126/science.1259433 – volume: 86 start-page: 269 issue: 2 year: 1996 ident: 9421_CR115 publication-title: Artificial Intelligence doi: 10.1016/0004-3702(95)00103-4 – ident: 9421_CR274 – ident: 9421_CR197 – ident: 9421_CR339 – ident: 9421_CR102 doi: 10.1145/1390156.1390199 – ident: 9421_CR201 – volume: 38 start-page: 156 issue: 2 year: 2008 ident: 9421_CR55 publication-title: IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) doi: 10.1109/TSMCC.2007.913919 – ident: 9421_CR113 – ident: 9421_CR130 – ident: 9421_CR16 |
| SSID | ssj0016261 |
| Score | 2.6821315 |
| Snippet | Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 750 |
| SubjectTerms | Artificial Intelligence Computer Science Computer Systems Organization and Communication Networks Domains Machine learning Multiagent systems New Horizons in Multiagent Learning Software Engineering/Programming and Operating Systems User Interfaces and Human Computer Interaction |
| Title | A survey and critique of multiagent deep reinforcement learning |
| URI | https://link.springer.com/article/10.1007/s10458-019-09421-1 https://www.proquest.com/docview/2307168851 |
| Volume | 33 |
| WOSCitedRecordID | wos000491059500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-7454 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0016261 issn: 1387-2532 databaseCode: RSV dateStart: 19980301 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA8yffDF-YnTKXnwTQNNm7bpkwxx-DTEL_ZW8lURpBvtNvC_95Kmm4oK-twkhLtc7nfN3e8QOgOPlCrGDEkSEREWF5xInmREplTwlKsg1Nw1m0hHIz4eZ7e-KKxus93bJ0l3U38odmOxTbyyKT4spARinnVwd9ya49390_LtACB6E2Zxy0EQhb5U5vs1PrujFcb88izqvM2w-799bqMtjy7xoDkOO2jNlLuo23ZuwN6Q99DlANfzamHesCg1VpbYCHaEJwV2CYbC1lthbcwUV8ZRqyr3FxH7HhPP--hxeP1wdUN8KwWiwMZmJEpATkIYUwCYNopmTOmgoCYNg1RD0MUylUglAbsBPmKZlKkFB0pJGgsKao4OUKeclOYQYRpSI2IDkU-kmaRaJDrWtIgBCQW2jUcP0VaiufI847bdxWu-Yki2EspBQrmTUA5zzpdzpg3Lxq-j-62icm9xdW4T2mnCAUD20EWrmNXnn1c7-tvwY7QZWt26csQ-6syquTlBG2oxe6mrU3cS3wFX5NUm |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA-igl6cnzidmoM3DTRt2qYnGeKYOIfolN1CviqCdGPdBv73Jlm6qaigpx6ahPBeXvN7fR8_AE7NjZRKQjRKEh4hEucUCZpkSKSY05TKIFTUkU2k3S7t97M7XxRWVtnuVUjSfak_FLuR2CZe2RQfEmJkfJ4V84xtIt_9w9M8dmAg-szNorYHQRT6Upnv1_h8HS0w5pewqLttWrX_7XMTbHh0CZuz47AFlnSxDWoVcwP0hrwDLpqwnIym-g3yQkFpGxuZHcFBDl2CIbf1VlBpPYQj7VqrSvcXEXqOiedd8Ni66l22kadSQNLY2BhFSUYx51rnBkxriTMiVZBjnYZBqozTRTKZCCkMdjP4iGRCpBYcSClwzLFRc7QHlotBofcBxCHWPNbG84kUEVjxRMUK57FBQoGl8agDXEmUSd9n3NJdvLJFh2QrIWYkxJyEmJlzNp8znHXZ-HV0o1IU8xZXMpvQjhNqAGQdnFeKWbz-ebWDvw0_AWvt3m2Hda67N4dgPbR6dqWJDbA8Hk30EViV0_FLOTp2p_IdXsfYCg |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bS8MwFA6iIr44rzidmgffNKxp0zZ9kqEORRkDL-wt5FYRpBtdN_Dfm6TtNkUF8blJCOfkkO8053wfAKfmRoolIRpFEQ8QCVOKBI0SJGLMaUyl5yvqxCbiXo8OBkl_oYvfVbvXT5JlT4NlacqK9kil7YXGNxLaIixb7kN8jEz-s0KsaJDN1x-eZ-8IBq6XKRe1fASBX7XNfL_G56tpjje_PJG6m6fb-P-eN8FGhTphpzwmW2BJZ9ugUSs6wCrAd8BFB44n-VS_Q54pKC3hkdkdHKbQFR5y24cFldYjmGtHuSrd30VYaU-87IKn7vXj5Q2qJBaQNLFXoCBKKOZc69SAbC1xQqTyUqxj34uVScZIIiMhhcF0BjeRRIjYggYpBQ45Nu4P9sByNsz0PoDYx5qH2mREgSICKx6pUOE0NAjJs_IeTYBr6zJZ8Y9bGYw3NmdOthZixkLMWYiZOWezOaOSfePX0a3aaayKxDGzhe44ogZYNsF57aT5559XO_jb8BOw1r_qsvvb3t0hWPetm13HYgssF_lEH4FVOS1ex_mxO6AfKC_g7g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+survey+and+critique+of+multiagent+deep+reinforcement+learning&rft.jtitle=Autonomous+agents+and+multi-agent+systems&rft.au=Hernandez-Leal%2C+Pablo&rft.au=Kartal%2C+Bilal&rft.au=Taylor%2C+Matthew+E.&rft.date=2019-11-01&rft.issn=1387-2532&rft.eissn=1573-7454&rft.volume=33&rft.issue=6&rft.spage=750&rft.epage=797&rft_id=info:doi/10.1007%2Fs10458-019-09421-1&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s10458_019_09421_1 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1387-2532&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1387-2532&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1387-2532&client=summon |