A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment
We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work re...
Uložené v:
| Vydané v: | Neurocomputing (Amsterdam) Ročník 411; s. 206 - 215 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
21.10.2020
|
| Predmet: | |
| ISSN: | 0925-2312, 1872-8286 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work reduces the overestimation error of neural network approximation and variance of estimation result using dual-centered critic, group target network smoothing and delayed policy updating. According to experiment results, it improves the ability to adapt complex missions eventually. Then, we discuss that there is an inevitable overestimation issue about existing multi-agent algorithms about approximating real action-value equations with neural network. We also explain the approximate error of equations in the multi-agent deep deterministic policy gradient algorithm mathematically and experimentally. Finally, the application of our algorithm in the mixed cooperative competition experimental environment further demonstrates the effectiveness and generalization of our algorithm, especially improving the group’s ability of adapting complex missions and completing more difficult missions. |
|---|---|
| AbstractList | We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work reduces the overestimation error of neural network approximation and variance of estimation result using dual-centered critic, group target network smoothing and delayed policy updating. According to experiment results, it improves the ability to adapt complex missions eventually. Then, we discuss that there is an inevitable overestimation issue about existing multi-agent algorithms about approximating real action-value equations with neural network. We also explain the approximate error of equations in the multi-agent deep deterministic policy gradient algorithm mathematically and experimentally. Finally, the application of our algorithm in the mixed cooperative competition experimental environment further demonstrates the effectiveness and generalization of our algorithm, especially improving the group’s ability of adapting complex missions and completing more difficult missions. |
| Author | Li, Zhi Zhang, Fengjiao Li, Jie |
| Author_xml | – sequence: 1 givenname: Fengjiao surname: Zhang fullname: Zhang, Fengjiao organization: Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu, Sichuan, China – sequence: 2 givenname: Jie surname: Li fullname: Li, Jie organization: The Center of Data Science and Service, Computer science Institute, Beijing university of Posts and Telecommunications, No. 10 Xitucheng Road, Haitian District, Beijing, China – sequence: 3 givenname: Zhi surname: Li fullname: Li, Zhi email: lizhi@scu.edu.cn organization: Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu, Sichuan, China |
| BookMark | eNqFkMtOwzAQRS1UJNrCH7DwDzj4kScLpKo8pUpsytpy7ElxldiR41bw9ySUFQtYzWiuztXcu0Az5x0gdM1owijLb_aJg4P2XcIppwnNEloVZ2jOyoKTkpf5DM1pxTPCBeMXaDEMe0pZwXg1R7sV3t4LUqsBDO4ObbRE7cBFbAB6HMC6xgcN3XRqQQVn3Q53EN-9wdbhzn6MnPa-h6Ci9Y6Mb_QQ7bRjcEcbvJvgS3TeqHaAq5-5RG-PD9v1M9m8Pr2sVxuiBc0jSetG89oYkYkmVWWpy4qKpshUY-q6SHWdjkIxSqbMTcVEniuma1ZXnOmUgRFLlJ58dfDDEKCRfbCdCp-SUTmVJffyVJacypI0k2NZI3b7C9M2fgeKQdn2P_juBMMY7GghyEFbcBqMDaCjNN7-bfAF7pSNiQ |
| CitedBy_id | crossref_primary_10_1088_1361_6501_ad21cf crossref_primary_10_1016_j_eswa_2025_127457 crossref_primary_10_1109_ACCESS_2024_3485036 crossref_primary_10_1016_j_cie_2021_107621 crossref_primary_10_1016_j_ast_2023_108228 crossref_primary_10_3390_machines12120902 crossref_primary_10_1007_s10489_023_04955_0 crossref_primary_10_3390_computation12060116 crossref_primary_10_1016_j_comcom_2025_108266 crossref_primary_10_1016_j_trc_2023_104221 crossref_primary_10_1007_s13042_022_01759_5 crossref_primary_10_1016_j_tra_2024_104067 crossref_primary_10_1016_j_future_2021_04_018 crossref_primary_10_3390_math11102379 crossref_primary_10_1002_aisy_202400112 crossref_primary_10_1016_j_engappai_2025_110548 crossref_primary_10_1016_j_segan_2024_101275 crossref_primary_10_1109_TCE_2023_3245334 crossref_primary_10_1002_int_22945 crossref_primary_10_1080_08839514_2022_2137632 crossref_primary_10_1016_j_enconman_2025_119656 crossref_primary_10_1109_TIE_2025_3552183 crossref_primary_10_1364_JOCN_526014 crossref_primary_10_1007_s10489_022_03821_9 crossref_primary_10_1080_23335777_2022_2130434 crossref_primary_10_1109_TII_2024_3465601 crossref_primary_10_3390_en17071728 crossref_primary_10_1016_j_compeleceng_2024_109425 crossref_primary_10_1016_j_neucom_2022_09_144 crossref_primary_10_1007_s11761_022_00334_8 crossref_primary_10_1016_j_neucom_2025_129343 crossref_primary_10_1109_ACCESS_2025_3565123 crossref_primary_10_1002_rnc_7991 crossref_primary_10_3390_app15137258 crossref_primary_10_1109_TETCI_2024_3369485 crossref_primary_10_1016_j_knosys_2025_113429 crossref_primary_10_1007_s10489_022_03643_9 crossref_primary_10_1016_j_sysarc_2024_103139 crossref_primary_10_1016_j_engappai_2024_108012 crossref_primary_10_3390_a17120579 crossref_primary_10_32604_cmes_2022_020394 crossref_primary_10_1007_s10489_024_06074_w crossref_primary_10_1007_s10707_023_00486_5 crossref_primary_10_1016_j_energy_2025_136308 crossref_primary_10_1007_s10489_025_06473_7 crossref_primary_10_1109_TTE_2023_3266734 crossref_primary_10_1016_j_ins_2021_11_054 crossref_primary_10_1109_TNET_2023_3342020 crossref_primary_10_1007_s13177_022_00334_0 crossref_primary_10_3390_math13142312 crossref_primary_10_1109_TNSE_2024_3517872 crossref_primary_10_1007_s42979_023_02326_7 crossref_primary_10_3390_a16090404 crossref_primary_10_3390_jmse11061201 crossref_primary_10_3390_app15063313 crossref_primary_10_1016_j_comcom_2025_108318 crossref_primary_10_1016_j_geits_2022_100028 crossref_primary_10_1007_s11042_021_11437_3 crossref_primary_10_1016_j_jnca_2024_104092 crossref_primary_10_1016_j_knosys_2024_111462 crossref_primary_10_1016_j_knosys_2024_112474 crossref_primary_10_3390_machines11010108 |
| Cites_doi | 10.1109/SMCIA.2008.5045926 10.1038/nature14236 10.1007/s40595-015-0045-x 10.1287/mnsc.1060.0614 10.1109/TAC.2019.2905215 10.1038/nature16961 10.1007/978-3-319-97310-4_48 |
| ContentType | Journal Article |
| Copyright | 2020 Elsevier B.V. |
| Copyright_xml | – notice: 2020 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.neucom.2020.05.097 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-8286 |
| EndPage | 215 |
| ExternalDocumentID | 10_1016_j_neucom_2020_05_097 S0925231220309796 |
| GroupedDBID | --- --K --M .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXLA AAXUO AAYFN ABBOA ABCQJ ABFNM ABJNI ABMAC ABYKQ ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W KOM LG9 M41 MO0 MOBAO N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSN SSV SSZ T5K ZMT ~G- 29N 9DU AAQXK AATTM AAXKI AAYWO AAYXX ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB HLZ HVGLF HZ~ R2- SBC SEW WUQ XPP ~HD |
| ID | FETCH-LOGICAL-c306t-4bfc2bdd353f4a88c8903f75afdbb74cb453f7f4ad86d91366a1cb1b921c41ed3 |
| ISICitedReferencesCount | 69 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000571895700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0925-2312 |
| IngestDate | Sat Nov 29 07:16:51 EST 2025 Tue Nov 18 20:38:48 EST 2025 Fri Feb 23 02:47:04 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Dual-critic MADDPG MATD3 Overestimation error Reinforcement learning |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c306t-4bfc2bdd353f4a88c8903f75afdbb74cb453f7f4ad86d91366a1cb1b921c41ed3 |
| PageCount | 10 |
| ParticipantIDs | crossref_primary_10_1016_j_neucom_2020_05_097 crossref_citationtrail_10_1016_j_neucom_2020_05_097 elsevier_sciencedirect_doi_10_1016_j_neucom_2020_05_097 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-10-21 |
| PublicationDateYYYYMMDD | 2020-10-21 |
| PublicationDate_xml | – month: 10 year: 2020 text: 2020-10-21 day: 21 |
| PublicationDecade | 2020 |
| PublicationTitle | Neurocomputing (Amsterdam) |
| PublicationYear | 2020 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Abed-Alguni, Chalup, Henskens, Paul (b0105) 2015; 2 Chen, Modares, Xie, Lewis, Wan, Xie (b0090) 2019; 64 S. Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep reinforcement learning, arXiv preprint arXiv:1511.06581. Y. Zheng, Z. Meng, J. Hao, Z. Zhang, Weighted double deep multiagent reinforcement learning in stochastic cooperative environments, in: Pacific Rim International Conference on Artificial Intelligence, Springer, pp. 421–429. B. Hengst, Discovering hierarchy in reinforcement learning with hexq, in: ICML, vol. 19, pp. 243–250. T.D. Kulkarni, K. Narasimhan, A. Saeedi, J. Tenenbaum, Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation, in: Advances in Neural Information Processing Systems, pp. 3675–3683. T. Kasai, H. Tenmoto, A. Kamiya, Learning of communication codes in multi-agent reinforcement learning problem, in: 2008 IEEE Conference on Soft Computing in Industrial Applications, IEEE, pp. 1–6. V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in: International Conference on Machine Learning, pp. 1928–1937. J. Foerster, I.A. Assael, N. de Freitas, S. Whiteson, Learning to communicate with deep multi-agent reinforcement learning, in: Advances in Neural Information Processing Systems, pp. 2137–2145. H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in: Thirtieth AAAI Conference on Artificial Intelligence. R. Lowe, Y. Wu, A. Tamar, J. Harb, O.P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Advances in Neural Information Processing Systems, pp. 6379–6390. M. Lauer, M. Riedmiller, An algorithm for distributed reinforcement learning in cooperative multi-agent systems, in. In Proceedings of the Seventeenth International Conference on Machine Learning, Citeseer. S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint arXiv:1802.09477. Watkins, Dayan (b0030) 1992; 8 T. Rashid, M. Samvelyan, C.S. de Witt, G. Farquhar, J. Foerster, S. Whiteson, Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning, arXiv preprint arXiv:1803.11485. M. He, H. Guo, Interleaved q-learning with partially coupled training process, in: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 449–457. Abed-alguni, Ottom (b0095) 2018; 16 Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski (b0035) 2015; 518 D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic policy gradient algorithms, 2014. B. O’Donoghue, I. Osband, R. Munos, V. Mnih, The uncertainty bellman equation and exploration, arXiv preprint arXiv:1709.05380. G. Tesauro, Extending q-learning to general adaptive multi-agent systems, in: Advances in Neural Information Processing Systems, pp. 871–878. A.T. Ryan Lowe, Yi Wu, Multi-agent particle environment (11 2018). https://github.com/openai/multiagent-particle-envs. T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971. O. Nachum, M. Norouzi, G. Tucker, D. Schuurmans, Smoothed action value functions for learning gaussian policies, arXiv preprint arXiv:1803.02348. Mannor, Simester, Sun, Tsitsiklis (b0050) 2007; 53 Greensmith, Bartlett, Baxter (b0060) 2004; 5 R. Fox, A. Pakman, N. Tishby, Taming the noise in reinforcement learning via soft updates, arXiv preprint arXiv:1512.08562. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature 529 (7587) (2016) 484. H.V. Hasselt, Double q-learning, in: Advances in Neural Information Processing Systems, pp. 2613–2621. R.S. Sutton, Reinforcement learning, The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, Boston, 1992. URL Publisher description http://www.loc.gov/catdir/enhancements/fy0820/92007567-d.html Table of contents only http://www.loc.gov/catdir/enhancements/fy0820/92007567-t.html. M. Jaderberg, W.M. Czarnecki, I. Dunning, L. Marris, G. Lever, A.G. Castaneda, C. Beattie, N.C. Rabinowitz, A.S. Morcos, A. Ruderman, Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, arXiv preprint arXiv:1807.01281. Chen (10.1016/j.neucom.2020.05.097_b0090) 2019; 64 10.1016/j.neucom.2020.05.097_b0075 10.1016/j.neucom.2020.05.097_b0130 10.1016/j.neucom.2020.05.097_b0010 10.1016/j.neucom.2020.05.097_b0055 10.1016/j.neucom.2020.05.097_b0110 10.1016/j.neucom.2020.05.097_b0070 10.1016/j.neucom.2020.05.097_b0150 10.1016/j.neucom.2020.05.097_b0115 10.1016/j.neucom.2020.05.097_b0155 10.1016/j.neucom.2020.05.097_b0135 10.1016/j.neucom.2020.05.097_b0015 Mnih (10.1016/j.neucom.2020.05.097_b0035) 2015; 518 Mannor (10.1016/j.neucom.2020.05.097_b0050) 2007; 53 10.1016/j.neucom.2020.05.097_b0080 10.1016/j.neucom.2020.05.097_b0085 10.1016/j.neucom.2020.05.097_b0140 10.1016/j.neucom.2020.05.097_b0020 10.1016/j.neucom.2020.05.097_b0065 10.1016/j.neucom.2020.05.097_b0120 Abed-alguni (10.1016/j.neucom.2020.05.097_b0095) 2018; 16 10.1016/j.neucom.2020.05.097_b0160 10.1016/j.neucom.2020.05.097_b0040 10.1016/j.neucom.2020.05.097_b0005 Greensmith (10.1016/j.neucom.2020.05.097_b0060) 2004; 5 10.1016/j.neucom.2020.05.097_b0045 10.1016/j.neucom.2020.05.097_b0100 10.1016/j.neucom.2020.05.097_b0145 10.1016/j.neucom.2020.05.097_b0025 Abed-Alguni (10.1016/j.neucom.2020.05.097_b0105) 2015; 2 10.1016/j.neucom.2020.05.097_b0125 Watkins (10.1016/j.neucom.2020.05.097_b0030) 1992; 8 |
| References_xml | – volume: 64 start-page: 4423 year: 2019 end-page: 4438 ident: b0090 article-title: Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics publication-title: IEEE Trans. Autom. Control – reference: O. Nachum, M. Norouzi, G. Tucker, D. Schuurmans, Smoothed action value functions for learning gaussian policies, arXiv preprint arXiv:1803.02348. – reference: T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971. – reference: R. Fox, A. Pakman, N. Tishby, Taming the noise in reinforcement learning via soft updates, arXiv preprint arXiv:1512.08562. – reference: T. Rashid, M. Samvelyan, C.S. de Witt, G. Farquhar, J. Foerster, S. Whiteson, Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning, arXiv preprint arXiv:1803.11485. – reference: V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in: International Conference on Machine Learning, pp. 1928–1937. – reference: M. Lauer, M. Riedmiller, An algorithm for distributed reinforcement learning in cooperative multi-agent systems, in. In Proceedings of the Seventeenth International Conference on Machine Learning, Citeseer. – reference: R. Lowe, Y. Wu, A. Tamar, J. Harb, O.P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Advances in Neural Information Processing Systems, pp. 6379–6390. – reference: B. Hengst, Discovering hierarchy in reinforcement learning with hexq, in: ICML, vol. 19, pp. 243–250. – reference: A.T. Ryan Lowe, Yi Wu, Multi-agent particle environment (11 2018). https://github.com/openai/multiagent-particle-envs. – reference: H.V. Hasselt, Double q-learning, in: Advances in Neural Information Processing Systems, pp. 2613–2621. – reference: M. He, H. Guo, Interleaved q-learning with partially coupled training process, in: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 449–457. – reference: B. O’Donoghue, I. Osband, R. Munos, V. Mnih, The uncertainty bellman equation and exploration, arXiv preprint arXiv:1709.05380. – reference: Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep reinforcement learning, arXiv preprint arXiv:1511.06581. – reference: T.D. Kulkarni, K. Narasimhan, A. Saeedi, J. Tenenbaum, Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation, in: Advances in Neural Information Processing Systems, pp. 3675–3683. – volume: 53 start-page: 308 year: 2007 end-page: 322 ident: b0050 article-title: Bias and variance approximation in value function estimates publication-title: Manage. Sci. – volume: 5 start-page: 1471 year: 2004 end-page: 1530 ident: b0060 article-title: Variance reduction techniques for gradient estimates in reinforcement learning publication-title: J. Mach. Learn. Res. – reference: D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature 529 (7587) (2016) 484. – reference: J. Foerster, I.A. Assael, N. de Freitas, S. Whiteson, Learning to communicate with deep multi-agent reinforcement learning, in: Advances in Neural Information Processing Systems, pp. 2137–2145. – reference: Y. Zheng, Z. Meng, J. Hao, Z. Zhang, Weighted double deep multiagent reinforcement learning in stochastic cooperative environments, in: Pacific Rim International Conference on Artificial Intelligence, Springer, pp. 421–429. – reference: D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic policy gradient algorithms, 2014. – volume: 518 start-page: 529 year: 2015 ident: b0035 article-title: Human-level control through deep reinforcement learning publication-title: Nature – reference: T. Kasai, H. Tenmoto, A. Kamiya, Learning of communication codes in multi-agent reinforcement learning problem, in: 2008 IEEE Conference on Soft Computing in Industrial Applications, IEEE, pp. 1–6. – reference: S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint arXiv:1802.09477. – reference: R.S. Sutton, Reinforcement learning, The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, Boston, 1992. URL Publisher description http://www.loc.gov/catdir/enhancements/fy0820/92007567-d.html Table of contents only http://www.loc.gov/catdir/enhancements/fy0820/92007567-t.html. – volume: 16 start-page: 41 year: 2018 end-page: 59 ident: b0095 article-title: Double delayed q-learning publication-title: Int. J. Artif. Intell. – reference: M. Jaderberg, W.M. Czarnecki, I. Dunning, L. Marris, G. Lever, A.G. Castaneda, C. Beattie, N.C. Rabinowitz, A.S. Morcos, A. Ruderman, Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, arXiv preprint arXiv:1807.01281. – volume: 8 start-page: 279 year: 1992 end-page: 292 ident: b0030 article-title: Q-learning publication-title: Mach. Learn. – reference: H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in: Thirtieth AAAI Conference on Artificial Intelligence. – reference: G. Tesauro, Extending q-learning to general adaptive multi-agent systems, in: Advances in Neural Information Processing Systems, pp. 871–878. – reference: S. Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum – volume: 2 start-page: 213 year: 2015 end-page: 226 ident: b0105 article-title: A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers publication-title: Vietnam J. Comput. Sci. – ident: 10.1016/j.neucom.2020.05.097_b0025 – ident: 10.1016/j.neucom.2020.05.097_b0075 – volume: 5 start-page: 1471 year: 2004 ident: 10.1016/j.neucom.2020.05.097_b0060 article-title: Variance reduction techniques for gradient estimates in reinforcement learning publication-title: J. Mach. Learn. Res. – ident: 10.1016/j.neucom.2020.05.097_b0155 – ident: 10.1016/j.neucom.2020.05.097_b0140 – ident: 10.1016/j.neucom.2020.05.097_b0065 – volume: 16 start-page: 41 issue: 2 year: 2018 ident: 10.1016/j.neucom.2020.05.097_b0095 article-title: Double delayed q-learning publication-title: Int. J. Artif. Intell. – ident: 10.1016/j.neucom.2020.05.097_b0135 doi: 10.1109/SMCIA.2008.5045926 – ident: 10.1016/j.neucom.2020.05.097_b0145 – ident: 10.1016/j.neucom.2020.05.097_b0085 – ident: 10.1016/j.neucom.2020.05.097_b0120 – ident: 10.1016/j.neucom.2020.05.097_b0015 – ident: 10.1016/j.neucom.2020.05.097_b0020 – ident: 10.1016/j.neucom.2020.05.097_b0070 – ident: 10.1016/j.neucom.2020.05.097_b0150 – ident: 10.1016/j.neucom.2020.05.097_b0055 – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 10.1016/j.neucom.2020.05.097_b0035 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – volume: 2 start-page: 213 issue: 4 year: 2015 ident: 10.1016/j.neucom.2020.05.097_b0105 article-title: A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers publication-title: Vietnam J. Comput. Sci. doi: 10.1007/s40595-015-0045-x – ident: 10.1016/j.neucom.2020.05.097_b0110 – ident: 10.1016/j.neucom.2020.05.097_b0160 – volume: 8 start-page: 279 issue: 3–4 year: 1992 ident: 10.1016/j.neucom.2020.05.097_b0030 article-title: Q-learning publication-title: Mach. Learn. – volume: 53 start-page: 308 issue: 2 year: 2007 ident: 10.1016/j.neucom.2020.05.097_b0050 article-title: Bias and variance approximation in value function estimates publication-title: Manage. Sci. doi: 10.1287/mnsc.1060.0614 – ident: 10.1016/j.neucom.2020.05.097_b0045 – volume: 64 start-page: 4423 issue: 11 year: 2019 ident: 10.1016/j.neucom.2020.05.097_b0090 article-title: Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics publication-title: IEEE Trans. Autom. Control doi: 10.1109/TAC.2019.2905215 – ident: 10.1016/j.neucom.2020.05.097_b0125 – ident: 10.1016/j.neucom.2020.05.097_b0005 doi: 10.1038/nature16961 – ident: 10.1016/j.neucom.2020.05.097_b0080 doi: 10.1007/978-3-319-97310-4_48 – ident: 10.1016/j.neucom.2020.05.097_b0100 – ident: 10.1016/j.neucom.2020.05.097_b0010 – ident: 10.1016/j.neucom.2020.05.097_b0040 – ident: 10.1016/j.neucom.2020.05.097_b0115 – ident: 10.1016/j.neucom.2020.05.097_b0130 |
| SSID | ssj0017129 |
| Score | 2.5398066 |
| Snippet | We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 206 |
| SubjectTerms | Dual-critic MADDPG MATD3 Overestimation error Reinforcement learning |
| Title | A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment |
| URI | https://dx.doi.org/10.1016/j.neucom.2020.05.097 |
| Volume | 411 |
| WOSCitedRecordID | wos000571895700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-8286 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017129 issn: 0925-2312 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FlAMX3qgtD-2BW7Qo69fuHi0oggpVSAQUcbH2leCodaKQVjn1tzP7sOsS3hIXy1pnndXM5_HseL4ZhJ4bDU4AZZZQmTGSKcqJVMIQnslcgqU0Y0-P_vSOnZzw6VS8HwwuWy7MxSlrGr7ditV_VTWMgbIddfYv1N3dFAbgHJQOR1A7HP9I8eVo8iol7u1kQrogkY4-NTLWrkZr6yulah8UbFtGzGMfaRf7OKu31hHdlisbsEG096x9ZlefFtf3an2FD-37Q8TIQ3nmCjAYh7Yu0tDFpkFc80Utl10ykM8oOK7t9YHPX-p-TAI2oGDMk6uYxC5ZJkQck5yAOxmMrw32lrPEM9n7BjmL5jea1HHRezsngfy5Y_hDDGLxorHnLgvILcpXZA3Jv9-V1P7gluJWkrgPTEwUN9BewnLBh2ivfHs0Pe6-QzGahGqNcekt-dJnCO7-14-dm57DMrmLbsedBi4DQu6hgW3uozttFw8cjfoDNC9xBxjcAwx2gMHXAINbwOAAGFw32AMG_wQwuAeYh-jj66PJyzcktt8gGvaRG3hwZzpRxqR5Ossk55qLcTpjuZwZpVimVQYXGFwyvDCCpkUhqVZUiYTqjFqTPkLDZtnYfYTHUlCZJsIqxTObFlJI8IIUd7KnVM4OUNrKrdKxNr1rkXJatUmIiypIu3LSrsZ5BdI-QKSbtQq1WX7ze9aqpIr-ZfAbK0DRL2ce_vPMx-jW1QPyBA0363P7FN3UF5v66_pZhNs3SAej2g |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+TD3-based+multi-agent+deep+reinforcement+learning+method+in+mixed+cooperation-competition+environment&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Zhang%2C+Fengjiao&rft.au=Li%2C+Jie&rft.au=Li%2C+Zhi&rft.date=2020-10-21&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=411&rft.spage=206&rft.epage=215&rft_id=info:doi/10.1016%2Fj.neucom.2020.05.097&rft.externalDocID=S0925231220309796 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon |