A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment

We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work re...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) Vol. 411; pp. 206 - 215
Main Authors: Zhang, Fengjiao, Li, Jie, Li, Zhi
Format: Journal Article
Language:English
Published: Elsevier B.V 21.10.2020
Subjects:
ISSN:0925-2312, 1872-8286
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work reduces the overestimation error of neural network approximation and variance of estimation result using dual-centered critic, group target network smoothing and delayed policy updating. According to experiment results, it improves the ability to adapt complex missions eventually. Then, we discuss that there is an inevitable overestimation issue about existing multi-agent algorithms about approximating real action-value equations with neural network. We also explain the approximate error of equations in the multi-agent deep deterministic policy gradient algorithm mathematically and experimentally. Finally, the application of our algorithm in the mixed cooperative competition experimental environment further demonstrates the effectiveness and generalization of our algorithm, especially improving the group’s ability of adapting complex missions and completing more difficult missions.
AbstractList We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work reduces the overestimation error of neural network approximation and variance of estimation result using dual-centered critic, group target network smoothing and delayed policy updating. According to experiment results, it improves the ability to adapt complex missions eventually. Then, we discuss that there is an inevitable overestimation issue about existing multi-agent algorithms about approximating real action-value equations with neural network. We also explain the approximate error of equations in the multi-agent deep deterministic policy gradient algorithm mathematically and experimentally. Finally, the application of our algorithm in the mixed cooperative competition experimental environment further demonstrates the effectiveness and generalization of our algorithm, especially improving the group’s ability of adapting complex missions and completing more difficult missions.
Author Li, Zhi
Zhang, Fengjiao
Li, Jie
Author_xml – sequence: 1
  givenname: Fengjiao
  surname: Zhang
  fullname: Zhang, Fengjiao
  organization: Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu, Sichuan, China
– sequence: 2
  givenname: Jie
  surname: Li
  fullname: Li, Jie
  organization: The Center of Data Science and Service, Computer science Institute, Beijing university of Posts and Telecommunications, No. 10 Xitucheng Road, Haitian District, Beijing, China
– sequence: 3
  givenname: Zhi
  surname: Li
  fullname: Li, Zhi
  email: lizhi@scu.edu.cn
  organization: Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu, Sichuan, China
BookMark eNqFkMtOwzAQRS1UJNrCH7DwDzj4kScLpKo8pUpsytpy7ElxldiR41bw9ySUFQtYzWiuztXcu0Az5x0gdM1owijLb_aJg4P2XcIppwnNEloVZ2jOyoKTkpf5DM1pxTPCBeMXaDEMe0pZwXg1R7sV3t4LUqsBDO4ObbRE7cBFbAB6HMC6xgcN3XRqQQVn3Q53EN-9wdbhzn6MnPa-h6Ci9Y6Mb_QQ7bRjcEcbvJvgS3TeqHaAq5-5RG-PD9v1M9m8Pr2sVxuiBc0jSetG89oYkYkmVWWpy4qKpshUY-q6SHWdjkIxSqbMTcVEniuma1ZXnOmUgRFLlJ58dfDDEKCRfbCdCp-SUTmVJffyVJacypI0k2NZI3b7C9M2fgeKQdn2P_juBMMY7GghyEFbcBqMDaCjNN7-bfAF7pSNiQ
CitedBy_id crossref_primary_10_1088_1361_6501_ad21cf
crossref_primary_10_1016_j_eswa_2025_127457
crossref_primary_10_1109_ACCESS_2024_3485036
crossref_primary_10_1016_j_cie_2021_107621
crossref_primary_10_1016_j_ast_2023_108228
crossref_primary_10_3390_machines12120902
crossref_primary_10_1007_s10489_023_04955_0
crossref_primary_10_3390_computation12060116
crossref_primary_10_1016_j_comcom_2025_108266
crossref_primary_10_1016_j_trc_2023_104221
crossref_primary_10_1007_s13042_022_01759_5
crossref_primary_10_1016_j_tra_2024_104067
crossref_primary_10_1016_j_future_2021_04_018
crossref_primary_10_3390_math11102379
crossref_primary_10_1002_aisy_202400112
crossref_primary_10_1016_j_engappai_2025_110548
crossref_primary_10_1016_j_segan_2024_101275
crossref_primary_10_1109_TCE_2023_3245334
crossref_primary_10_1002_int_22945
crossref_primary_10_1080_08839514_2022_2137632
crossref_primary_10_1016_j_enconman_2025_119656
crossref_primary_10_1109_TIE_2025_3552183
crossref_primary_10_1364_JOCN_526014
crossref_primary_10_1007_s10489_022_03821_9
crossref_primary_10_1080_23335777_2022_2130434
crossref_primary_10_1109_TII_2024_3465601
crossref_primary_10_3390_en17071728
crossref_primary_10_1016_j_compeleceng_2024_109425
crossref_primary_10_1016_j_neucom_2022_09_144
crossref_primary_10_1007_s11761_022_00334_8
crossref_primary_10_1016_j_neucom_2025_129343
crossref_primary_10_1109_ACCESS_2025_3565123
crossref_primary_10_1002_rnc_7991
crossref_primary_10_3390_app15137258
crossref_primary_10_1109_TETCI_2024_3369485
crossref_primary_10_1016_j_knosys_2025_113429
crossref_primary_10_1007_s10489_022_03643_9
crossref_primary_10_1016_j_sysarc_2024_103139
crossref_primary_10_1016_j_engappai_2024_108012
crossref_primary_10_3390_a17120579
crossref_primary_10_32604_cmes_2022_020394
crossref_primary_10_1007_s10489_024_06074_w
crossref_primary_10_1007_s10707_023_00486_5
crossref_primary_10_1016_j_energy_2025_136308
crossref_primary_10_1007_s10489_025_06473_7
crossref_primary_10_1109_TTE_2023_3266734
crossref_primary_10_1016_j_ins_2021_11_054
crossref_primary_10_1109_TNET_2023_3342020
crossref_primary_10_1007_s13177_022_00334_0
crossref_primary_10_3390_math13142312
crossref_primary_10_1109_TNSE_2024_3517872
crossref_primary_10_1007_s42979_023_02326_7
crossref_primary_10_3390_a16090404
crossref_primary_10_3390_jmse11061201
crossref_primary_10_3390_app15063313
crossref_primary_10_1016_j_comcom_2025_108318
crossref_primary_10_1016_j_geits_2022_100028
crossref_primary_10_1007_s11042_021_11437_3
crossref_primary_10_1016_j_jnca_2024_104092
crossref_primary_10_1016_j_knosys_2024_111462
crossref_primary_10_1016_j_knosys_2024_112474
crossref_primary_10_3390_machines11010108
Cites_doi 10.1109/SMCIA.2008.5045926
10.1038/nature14236
10.1007/s40595-015-0045-x
10.1287/mnsc.1060.0614
10.1109/TAC.2019.2905215
10.1038/nature16961
10.1007/978-3-319-97310-4_48
ContentType Journal Article
Copyright 2020 Elsevier B.V.
Copyright_xml – notice: 2020 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.neucom.2020.05.097
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-8286
EndPage 215
ExternalDocumentID 10_1016_j_neucom_2020_05_097
S0925231220309796
GroupedDBID ---
--K
--M
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXLA
AAXUO
AAYFN
ABBOA
ABCQJ
ABFNM
ABJNI
ABMAC
ABYKQ
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
KOM
LG9
M41
MO0
MOBAO
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSN
SSV
SSZ
T5K
ZMT
~G-
29N
9DU
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
R2-
SBC
SEW
WUQ
XPP
~HD
ID FETCH-LOGICAL-c306t-4bfc2bdd353f4a88c8903f75afdbb74cb453f7f4ad86d91366a1cb1b921c41ed3
ISICitedReferencesCount 69
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000571895700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0925-2312
IngestDate Sat Nov 29 07:16:51 EST 2025
Tue Nov 18 20:38:48 EST 2025
Fri Feb 23 02:47:04 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Dual-critic
MADDPG
MATD3
Overestimation error
Reinforcement learning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-4bfc2bdd353f4a88c8903f75afdbb74cb453f7f4ad86d91366a1cb1b921c41ed3
PageCount 10
ParticipantIDs crossref_primary_10_1016_j_neucom_2020_05_097
crossref_citationtrail_10_1016_j_neucom_2020_05_097
elsevier_sciencedirect_doi_10_1016_j_neucom_2020_05_097
PublicationCentury 2000
PublicationDate 2020-10-21
PublicationDateYYYYMMDD 2020-10-21
PublicationDate_xml – month: 10
  year: 2020
  text: 2020-10-21
  day: 21
PublicationDecade 2020
PublicationTitle Neurocomputing (Amsterdam)
PublicationYear 2020
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Abed-Alguni, Chalup, Henskens, Paul (b0105) 2015; 2
Chen, Modares, Xie, Lewis, Wan, Xie (b0090) 2019; 64
S. Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum
Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep reinforcement learning, arXiv preprint arXiv:1511.06581.
Y. Zheng, Z. Meng, J. Hao, Z. Zhang, Weighted double deep multiagent reinforcement learning in stochastic cooperative environments, in: Pacific Rim International Conference on Artificial Intelligence, Springer, pp. 421–429.
B. Hengst, Discovering hierarchy in reinforcement learning with hexq, in: ICML, vol. 19, pp. 243–250.
T.D. Kulkarni, K. Narasimhan, A. Saeedi, J. Tenenbaum, Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation, in: Advances in Neural Information Processing Systems, pp. 3675–3683.
T. Kasai, H. Tenmoto, A. Kamiya, Learning of communication codes in multi-agent reinforcement learning problem, in: 2008 IEEE Conference on Soft Computing in Industrial Applications, IEEE, pp. 1–6.
V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in: International Conference on Machine Learning, pp. 1928–1937.
J. Foerster, I.A. Assael, N. de Freitas, S. Whiteson, Learning to communicate with deep multi-agent reinforcement learning, in: Advances in Neural Information Processing Systems, pp. 2137–2145.
H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in: Thirtieth AAAI Conference on Artificial Intelligence.
R. Lowe, Y. Wu, A. Tamar, J. Harb, O.P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Advances in Neural Information Processing Systems, pp. 6379–6390.
M. Lauer, M. Riedmiller, An algorithm for distributed reinforcement learning in cooperative multi-agent systems, in. In Proceedings of the Seventeenth International Conference on Machine Learning, Citeseer.
S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint arXiv:1802.09477.
Watkins, Dayan (b0030) 1992; 8
T. Rashid, M. Samvelyan, C.S. de Witt, G. Farquhar, J. Foerster, S. Whiteson, Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning, arXiv preprint arXiv:1803.11485.
M. He, H. Guo, Interleaved q-learning with partially coupled training process, in: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 449–457.
Abed-alguni, Ottom (b0095) 2018; 16
Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski (b0035) 2015; 518
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic policy gradient algorithms, 2014.
B. O’Donoghue, I. Osband, R. Munos, V. Mnih, The uncertainty bellman equation and exploration, arXiv preprint arXiv:1709.05380.
G. Tesauro, Extending q-learning to general adaptive multi-agent systems, in: Advances in Neural Information Processing Systems, pp. 871–878.
A.T. Ryan Lowe, Yi Wu, Multi-agent particle environment (11 2018). https://github.com/openai/multiagent-particle-envs.
T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971.
O. Nachum, M. Norouzi, G. Tucker, D. Schuurmans, Smoothed action value functions for learning gaussian policies, arXiv preprint arXiv:1803.02348.
Mannor, Simester, Sun, Tsitsiklis (b0050) 2007; 53
Greensmith, Bartlett, Baxter (b0060) 2004; 5
R. Fox, A. Pakman, N. Tishby, Taming the noise in reinforcement learning via soft updates, arXiv preprint arXiv:1512.08562.
D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature 529 (7587) (2016) 484.
H.V. Hasselt, Double q-learning, in: Advances in Neural Information Processing Systems, pp. 2613–2621.
R.S. Sutton, Reinforcement learning, The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, Boston, 1992. URL Publisher description http://www.loc.gov/catdir/enhancements/fy0820/92007567-d.html Table of contents only http://www.loc.gov/catdir/enhancements/fy0820/92007567-t.html.
M. Jaderberg, W.M. Czarnecki, I. Dunning, L. Marris, G. Lever, A.G. Castaneda, C. Beattie, N.C. Rabinowitz, A.S. Morcos, A. Ruderman, Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, arXiv preprint arXiv:1807.01281.
Chen (10.1016/j.neucom.2020.05.097_b0090) 2019; 64
10.1016/j.neucom.2020.05.097_b0075
10.1016/j.neucom.2020.05.097_b0130
10.1016/j.neucom.2020.05.097_b0010
10.1016/j.neucom.2020.05.097_b0055
10.1016/j.neucom.2020.05.097_b0110
10.1016/j.neucom.2020.05.097_b0070
10.1016/j.neucom.2020.05.097_b0150
10.1016/j.neucom.2020.05.097_b0115
10.1016/j.neucom.2020.05.097_b0155
10.1016/j.neucom.2020.05.097_b0135
10.1016/j.neucom.2020.05.097_b0015
Mnih (10.1016/j.neucom.2020.05.097_b0035) 2015; 518
Mannor (10.1016/j.neucom.2020.05.097_b0050) 2007; 53
10.1016/j.neucom.2020.05.097_b0080
10.1016/j.neucom.2020.05.097_b0085
10.1016/j.neucom.2020.05.097_b0140
10.1016/j.neucom.2020.05.097_b0020
10.1016/j.neucom.2020.05.097_b0065
10.1016/j.neucom.2020.05.097_b0120
Abed-alguni (10.1016/j.neucom.2020.05.097_b0095) 2018; 16
10.1016/j.neucom.2020.05.097_b0160
10.1016/j.neucom.2020.05.097_b0040
10.1016/j.neucom.2020.05.097_b0005
Greensmith (10.1016/j.neucom.2020.05.097_b0060) 2004; 5
10.1016/j.neucom.2020.05.097_b0045
10.1016/j.neucom.2020.05.097_b0100
10.1016/j.neucom.2020.05.097_b0145
10.1016/j.neucom.2020.05.097_b0025
Abed-Alguni (10.1016/j.neucom.2020.05.097_b0105) 2015; 2
10.1016/j.neucom.2020.05.097_b0125
Watkins (10.1016/j.neucom.2020.05.097_b0030) 1992; 8
References_xml – volume: 64
  start-page: 4423
  year: 2019
  end-page: 4438
  ident: b0090
  article-title: Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics
  publication-title: IEEE Trans. Autom. Control
– reference: O. Nachum, M. Norouzi, G. Tucker, D. Schuurmans, Smoothed action value functions for learning gaussian policies, arXiv preprint arXiv:1803.02348.
– reference: T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971.
– reference: R. Fox, A. Pakman, N. Tishby, Taming the noise in reinforcement learning via soft updates, arXiv preprint arXiv:1512.08562.
– reference: T. Rashid, M. Samvelyan, C.S. de Witt, G. Farquhar, J. Foerster, S. Whiteson, Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning, arXiv preprint arXiv:1803.11485.
– reference: V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in: International Conference on Machine Learning, pp. 1928–1937.
– reference: M. Lauer, M. Riedmiller, An algorithm for distributed reinforcement learning in cooperative multi-agent systems, in. In Proceedings of the Seventeenth International Conference on Machine Learning, Citeseer.
– reference: R. Lowe, Y. Wu, A. Tamar, J. Harb, O.P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Advances in Neural Information Processing Systems, pp. 6379–6390.
– reference: B. Hengst, Discovering hierarchy in reinforcement learning with hexq, in: ICML, vol. 19, pp. 243–250.
– reference: A.T. Ryan Lowe, Yi Wu, Multi-agent particle environment (11 2018). https://github.com/openai/multiagent-particle-envs.
– reference: H.V. Hasselt, Double q-learning, in: Advances in Neural Information Processing Systems, pp. 2613–2621.
– reference: M. He, H. Guo, Interleaved q-learning with partially coupled training process, in: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, pp. 449–457.
– reference: B. O’Donoghue, I. Osband, R. Munos, V. Mnih, The uncertainty bellman equation and exploration, arXiv preprint arXiv:1709.05380.
– reference: Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep reinforcement learning, arXiv preprint arXiv:1511.06581.
– reference: T.D. Kulkarni, K. Narasimhan, A. Saeedi, J. Tenenbaum, Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation, in: Advances in Neural Information Processing Systems, pp. 3675–3683.
– volume: 53
  start-page: 308
  year: 2007
  end-page: 322
  ident: b0050
  article-title: Bias and variance approximation in value function estimates
  publication-title: Manage. Sci.
– volume: 5
  start-page: 1471
  year: 2004
  end-page: 1530
  ident: b0060
  article-title: Variance reduction techniques for gradient estimates in reinforcement learning
  publication-title: J. Mach. Learn. Res.
– reference: D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature 529 (7587) (2016) 484.
– reference: J. Foerster, I.A. Assael, N. de Freitas, S. Whiteson, Learning to communicate with deep multi-agent reinforcement learning, in: Advances in Neural Information Processing Systems, pp. 2137–2145.
– reference: Y. Zheng, Z. Meng, J. Hao, Z. Zhang, Weighted double deep multiagent reinforcement learning in stochastic cooperative environments, in: Pacific Rim International Conference on Artificial Intelligence, Springer, pp. 421–429.
– reference: D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic policy gradient algorithms, 2014.
– volume: 518
  start-page: 529
  year: 2015
  ident: b0035
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
– reference: T. Kasai, H. Tenmoto, A. Kamiya, Learning of communication codes in multi-agent reinforcement learning problem, in: 2008 IEEE Conference on Soft Computing in Industrial Applications, IEEE, pp. 1–6.
– reference: S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint arXiv:1802.09477.
– reference: R.S. Sutton, Reinforcement learning, The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, Boston, 1992. URL Publisher description http://www.loc.gov/catdir/enhancements/fy0820/92007567-d.html Table of contents only http://www.loc.gov/catdir/enhancements/fy0820/92007567-t.html.
– volume: 16
  start-page: 41
  year: 2018
  end-page: 59
  ident: b0095
  article-title: Double delayed q-learning
  publication-title: Int. J. Artif. Intell.
– reference: M. Jaderberg, W.M. Czarnecki, I. Dunning, L. Marris, G. Lever, A.G. Castaneda, C. Beattie, N.C. Rabinowitz, A.S. Morcos, A. Ruderman, Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, arXiv preprint arXiv:1807.01281.
– volume: 8
  start-page: 279
  year: 1992
  end-page: 292
  ident: b0030
  article-title: Q-learning
  publication-title: Mach. Learn.
– reference: H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in: Thirtieth AAAI Conference on Artificial Intelligence.
– reference: G. Tesauro, Extending q-learning to general adaptive multi-agent systems, in: Advances in Neural Information Processing Systems, pp. 871–878.
– reference: S. Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in: Proceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum
– volume: 2
  start-page: 213
  year: 2015
  end-page: 226
  ident: b0105
  article-title: A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers
  publication-title: Vietnam J. Comput. Sci.
– ident: 10.1016/j.neucom.2020.05.097_b0025
– ident: 10.1016/j.neucom.2020.05.097_b0075
– volume: 5
  start-page: 1471
  year: 2004
  ident: 10.1016/j.neucom.2020.05.097_b0060
  article-title: Variance reduction techniques for gradient estimates in reinforcement learning
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.neucom.2020.05.097_b0155
– ident: 10.1016/j.neucom.2020.05.097_b0140
– ident: 10.1016/j.neucom.2020.05.097_b0065
– volume: 16
  start-page: 41
  issue: 2
  year: 2018
  ident: 10.1016/j.neucom.2020.05.097_b0095
  article-title: Double delayed q-learning
  publication-title: Int. J. Artif. Intell.
– ident: 10.1016/j.neucom.2020.05.097_b0135
  doi: 10.1109/SMCIA.2008.5045926
– ident: 10.1016/j.neucom.2020.05.097_b0145
– ident: 10.1016/j.neucom.2020.05.097_b0085
– ident: 10.1016/j.neucom.2020.05.097_b0120
– ident: 10.1016/j.neucom.2020.05.097_b0015
– ident: 10.1016/j.neucom.2020.05.097_b0020
– ident: 10.1016/j.neucom.2020.05.097_b0070
– ident: 10.1016/j.neucom.2020.05.097_b0150
– ident: 10.1016/j.neucom.2020.05.097_b0055
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: 10.1016/j.neucom.2020.05.097_b0035
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– volume: 2
  start-page: 213
  issue: 4
  year: 2015
  ident: 10.1016/j.neucom.2020.05.097_b0105
  article-title: A multi-agent cooperative reinforcement learning model using a hierarchy of consultants, tutors and workers
  publication-title: Vietnam J. Comput. Sci.
  doi: 10.1007/s40595-015-0045-x
– ident: 10.1016/j.neucom.2020.05.097_b0110
– ident: 10.1016/j.neucom.2020.05.097_b0160
– volume: 8
  start-page: 279
  issue: 3–4
  year: 1992
  ident: 10.1016/j.neucom.2020.05.097_b0030
  article-title: Q-learning
  publication-title: Mach. Learn.
– volume: 53
  start-page: 308
  issue: 2
  year: 2007
  ident: 10.1016/j.neucom.2020.05.097_b0050
  article-title: Bias and variance approximation in value function estimates
  publication-title: Manage. Sci.
  doi: 10.1287/mnsc.1060.0614
– ident: 10.1016/j.neucom.2020.05.097_b0045
– volume: 64
  start-page: 4423
  issue: 11
  year: 2019
  ident: 10.1016/j.neucom.2020.05.097_b0090
  article-title: Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/TAC.2019.2905215
– ident: 10.1016/j.neucom.2020.05.097_b0125
– ident: 10.1016/j.neucom.2020.05.097_b0005
  doi: 10.1038/nature16961
– ident: 10.1016/j.neucom.2020.05.097_b0080
  doi: 10.1007/978-3-319-97310-4_48
– ident: 10.1016/j.neucom.2020.05.097_b0100
– ident: 10.1016/j.neucom.2020.05.097_b0010
– ident: 10.1016/j.neucom.2020.05.097_b0040
– ident: 10.1016/j.neucom.2020.05.097_b0115
– ident: 10.1016/j.neucom.2020.05.097_b0130
SSID ssj0017129
Score 2.5398066
Snippet We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 206
SubjectTerms Dual-critic
MADDPG
MATD3
Overestimation error
Reinforcement learning
Title A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment
URI https://dx.doi.org/10.1016/j.neucom.2020.05.097
Volume 411
WOSCitedRecordID wos000571895700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-8286
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017129
  issn: 0925-2312
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FlAMX3ogCRXvgFi3yY-31Hi0oKhWqOAQUcbH25eCodaKQlvx8Zh92HUoLPXCxonW863i-zMyOZ75B6A1nSoBVzEhupCa2ZplIzSVRsBtRKtK1cLSLXz-xk5NiNuOfR6OfXS3MxSlr22K75av_KmoYA2Hb0tlbiLufFAbgMwgdjiB2OP6T4MvJ9H1KrHXSPl2QCFs-NdHGrCZr45hSlQsKdi0j5qGPtI19nDVbYwvdlivjsUGU86xdZtewLG7o1TqGD-X6Q4TIQ3lmCRi0RVsfaehj0_C45otGLPtkIJdRcNyY3YFv35thTAI2oKDMfaFzF1xMMgKe446epUGrBk0Z5QOjm_iaziv63IcWFm9bc26Te-xajmjV5_Tu0mf_Ztb6ZMMuj21R-VkqO0sVZRXMcgftJSzjxRjtlR8PZ8f9CygWJ56mMfyQrurSpQZevZs_ezUDT2X6EN0PWwxcemg8QiPTPkYPuvYdOGjzJ2he4h4peIAUbJGCd5CCO6RgjxTctNghBV-DFDxAylP05cPh9N0RCX03iIIN5IZQWatEap1maU1FUaiCR2nNMlFrKRlVksIJBqd0kWsep3kuYiVjyZNY0djo9Bkat8vWPEdYcFrU1IBRiCXVlEvHWKZjIVQk8iLfR2n33CoVSOltb5TT6iap7SPSX7XypCx_-T7rRFIFx9I7jBXg7MYrX9xypZfo3uX_4RUab9bn5gDdVReb5sf6dQDZL-zcn6I
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+TD3-based+multi-agent+deep+reinforcement+learning+method+in+mixed+cooperation-competition+environment&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Zhang%2C+Fengjiao&rft.au=Li%2C+Jie&rft.au=Li%2C+Zhi&rft.date=2020-10-21&rft.issn=0925-2312&rft.volume=411&rft.spage=206&rft.epage=215&rft_id=info:doi/10.1016%2Fj.neucom.2020.05.097&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_neucom_2020_05_097
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon