Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems

•A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to compute the corresponding N coupled algebraic Riccati equations.•The policy iterative algorithm is applied to solve the coupled algebraic Riccati...

Full description

Saved in:
Bibliographic Details
Published in:Applied mathematics and computation Vol. 412; p. 126537
Main Authors: Xin, Xilin, Tu, Yidong, Stojanovic, Vladimir, Wang, Hai, Shi, Kaibo, He, Shuping, Pan, Tianhong
Format: Journal Article
Language:English
Published: Elsevier Inc 01.01.2022
Subjects:
ISSN:0096-3003, 1873-5649
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract •A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to compute the corresponding N coupled algebraic Riccati equations.•The policy iterative algorithm is applied to solve the coupled algebraic Riccati equations corresponding to the multiplayer nonzero sum games. In this paper, a novel online mode-free integral reinforcement learning algorithm is proposed to solve the multiplayer non-zero sum games. We first collect and learn the subsystems information of states and inputs; then we use the online learning to compute the corresponding N coupled algebraic Riccati equations. The policy iterative algorithm proposed in this paper can solve the coupled algebraic Riccati equations corresponding to the multiplayer non-zero sum games. Finally, the effectiveness and feasibility of the design method of this paper is proved by simulation example with three players.
AbstractList •A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to compute the corresponding N coupled algebraic Riccati equations.•The policy iterative algorithm is applied to solve the coupled algebraic Riccati equations corresponding to the multiplayer nonzero sum games. In this paper, a novel online mode-free integral reinforcement learning algorithm is proposed to solve the multiplayer non-zero sum games. We first collect and learn the subsystems information of states and inputs; then we use the online learning to compute the corresponding N coupled algebraic Riccati equations. The policy iterative algorithm proposed in this paper can solve the coupled algebraic Riccati equations corresponding to the multiplayer non-zero sum games. Finally, the effectiveness and feasibility of the design method of this paper is proved by simulation example with three players.
ArticleNumber 126537
Author He, Shuping
Xin, Xilin
Tu, Yidong
Wang, Hai
Pan, Tianhong
Shi, Kaibo
Stojanovic, Vladimir
Author_xml – sequence: 1
  givenname: Xilin
  surname: Xin
  fullname: Xin, Xilin
  organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
– sequence: 2
  givenname: Yidong
  surname: Tu
  fullname: Tu, Yidong
  organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
– sequence: 3
  givenname: Vladimir
  surname: Stojanovic
  fullname: Stojanovic, Vladimir
  organization: Department of Automatic Control, Robotics and Fluid Technique, Faculty of Mechanical and Civil Engineering, University of Kragujevac, Kraljevo 36000, Serbia
– sequence: 4
  givenname: Hai
  surname: Wang
  fullname: Wang, Hai
  organization: Discipline of Engineering and Energy, Murdoch University, 90 South Street, Murdoch, WA 6150, Australia
– sequence: 5
  givenname: Kaibo
  surname: Shi
  fullname: Shi, Kaibo
  organization: School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu 610106, China
– sequence: 6
  givenname: Shuping
  surname: He
  fullname: He, Shuping
  email: shuping.he@ahu.edu.cn
  organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
– sequence: 7
  givenname: Tianhong
  surname: Pan
  fullname: Pan, Tianhong
  email: hpan@ahu.edu.cn
  organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
BookMark eNp9kMtOwzAQRS0EEuXxAez8AyljJ44bsUIVL6mIDawt1x1XDrFd2Q5S-XpSlRULVnd17sw9F-Q0xICE3DCYM2DtbT_X3sw5cDZnvBW1PCEztpB1JdqmOyUzgK6taoD6nFzk3AOAbFkzI_1bGFxAmtAFG5NBj6HQAXUKLmypH4fidoPeY6LTxeobU6R59HSrPWYaLTUxFBfGOOaqOI_0VafP-EX70e_ooVknmve5oM9X5MzqIeP1b16Sj8eH9-VztXp7elneryrDO1kqIbXeiKY1DWec4xprDu1CbDq-5qJZg0Br1gthm1pKK4W13DQdcA3SWGAa6ksij70mxZwTWmVc0cVNjybtBsVAHZSpXk3K1EGZOiqbSPaH3CXnddr_y9wdGZwmfTlMKhuHweDGJTRFbaL7h_4BgMKIZA
CitedBy_id crossref_primary_10_1109_TITS_2025_3546612
crossref_primary_10_1109_TASE_2023_3279829
crossref_primary_10_1016_j_automatica_2023_111101
crossref_primary_10_1109_ACCESS_2022_3198968
crossref_primary_10_1109_TFUZZ_2023_3265666
crossref_primary_10_1109_TASE_2023_3299275
crossref_primary_10_1016_j_engappai_2023_105959
crossref_primary_10_1016_j_neunet_2023_06_015
crossref_primary_10_1016_j_neunet_2023_02_045
crossref_primary_10_1038_s41598_024_56497_1
crossref_primary_10_1016_j_neunet_2022_11_008
crossref_primary_10_1109_TASE_2023_3234961
crossref_primary_10_1109_TSP_2021_3130967
crossref_primary_10_1109_TSP_2022_3176109
crossref_primary_10_1016_j_neunet_2022_05_017
crossref_primary_10_1016_j_automatica_2025_112591
crossref_primary_10_1016_j_engappai_2023_106050
crossref_primary_10_1016_j_measurement_2022_112356
crossref_primary_10_1109_TCSI_2024_3387914
crossref_primary_10_1016_j_neunet_2023_02_033
crossref_primary_10_1109_TII_2024_3465601
crossref_primary_10_1002_mma_10143
crossref_primary_10_1007_s11063_023_11209_0
crossref_primary_10_1016_j_ins_2022_10_022
crossref_primary_10_1109_TSMC_2024_3405023
crossref_primary_10_1109_TASE_2024_3506592
crossref_primary_10_1109_TNNLS_2024_3487760
crossref_primary_10_1007_s00521_021_06652_w
crossref_primary_10_1007_s12555_021_0675_y
crossref_primary_10_1016_j_neunet_2022_05_002
crossref_primary_10_1109_TSMC_2024_3462762
crossref_primary_10_1109_TCSII_2022_3233790
crossref_primary_10_1007_s40815_022_01291_2
crossref_primary_10_1109_TSP_2023_3274937
crossref_primary_10_1016_j_automatica_2024_111886
crossref_primary_10_1109_JAS_2023_123960
crossref_primary_10_1088_1361_6501_ad22cc
crossref_primary_10_1109_TCYB_2025_3538787
crossref_primary_10_1109_TAI_2024_3415550
crossref_primary_10_1016_j_conengprac_2022_105257
crossref_primary_10_1016_j_eswa_2023_119774
crossref_primary_10_1016_j_neunet_2022_06_032
crossref_primary_10_1155_2022_3205960
crossref_primary_10_1016_j_neunet_2023_07_009
crossref_primary_10_1109_ACCESS_2023_3313411
crossref_primary_10_1007_s10846_022_01702_4
crossref_primary_10_1109_TCSII_2022_3199246
crossref_primary_10_1007_s13042_023_01845_2
crossref_primary_10_1002_rnc_7501
crossref_primary_10_1007_s40747_023_01068_6
crossref_primary_10_1007_s13042_022_01614_7
crossref_primary_10_1007_s10489_023_05050_0
crossref_primary_10_1109_TNNLS_2022_3186229
crossref_primary_10_1007_s11424_025_4535_3
crossref_primary_10_1109_TSMC_2022_3189771
crossref_primary_10_1109_JSYST_2025_3533880
crossref_primary_10_3390_s23218740
crossref_primary_10_1007_s00170_022_09535_z
crossref_primary_10_1080_00051144_2023_2203552
crossref_primary_10_1016_j_ins_2022_12_073
crossref_primary_10_1016_j_amc_2024_128803
crossref_primary_10_1016_j_ins_2022_09_029
crossref_primary_10_1007_s11063_022_11109_9
crossref_primary_10_1007_s40747_023_00995_8
crossref_primary_10_1109_TCYB_2022_3220537
Cites_doi 10.1016/j.automatica.2021.109590
10.1109/JSYST.2019.2891520
10.1016/j.automatica.2012.06.096
10.3182/20130204-3-FR-2033.00186
10.1109/TCYB.2016.2600753
10.1109/JSYST.2013.2265187
10.1016/j.automatica.2008.08.017
10.1080/00207177508922037
10.1109/TNNLS.2014.2316245
10.1016/j.jfranklin.2020.08.037
10.1109/MCAS.2009.933854
10.1109/TAC.2017.2660240
10.1109/TFUZZ.2019.2935685
10.1109/TNNLS.2020.2995708
10.1016/j.automatica.2008.08.010
10.1109/TAC.1969.1099088
10.1109/JSYST.2012.2208809
10.1016/j.neucom.2015.10.081
ContentType Journal Article
Copyright 2021
Copyright_xml – notice: 2021
DBID AAYXX
CITATION
DOI 10.1016/j.amc.2021.126537
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1873-5649
ExternalDocumentID 10_1016_j_amc_2021_126537
S0096300321006214
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
23M
4.4
457
4G.
5GY
6J9
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAXUO
ABAOU
ABFNM
ABFRF
ABJNI
ABMAC
ABYKQ
ACAZW
ACDAQ
ACGFO
ACGFS
ACRLP
ADBBV
ADEZE
ADGUI
AEBSH
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AIEXJ
AIGVJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
ARUGR
AXJTR
BKOJK
BLXMC
CS3
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
IHE
J1W
KOM
LG9
M26
M41
MHUIS
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
RNS
ROL
RPZ
RXW
SBC
SDF
SDG
SES
SME
SPC
SPCBC
SSW
SSZ
T5K
TN5
WH7
X6Y
XPP
ZMT
~02
~G-
5VS
9DU
AAQFI
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABEFU
ABWVN
ABXDB
ACLOT
ACRPL
ACVFH
ADCNI
ADIYS
ADMUD
ADNMO
AEIPS
AEUPX
AFFNX
AFJKZ
AFPUW
AGQPQ
AI.
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
G-2
HLZ
HMJ
HVGLF
HZ~
R2-
SEW
TAE
VH1
VOH
WUQ
~HD
ID FETCH-LOGICAL-c297t-57aad546c42122ebe320685d92b254b05efcb85f4377f75ff2c4902a07cf01a03
ISICitedReferencesCount 150
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000697154500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0096-3003
IngestDate Tue Nov 18 22:29:45 EST 2025
Sat Nov 29 07:24:38 EST 2025
Fri Feb 23 02:43:49 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Coupled algebraic Riccati equations
Markov jump linear systems
Multiplayer non-zero sum games
Reinforcement learning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c297t-57aad546c42122ebe320685d92b254b05efcb85f4377f75ff2c4902a07cf01a03
ParticipantIDs crossref_citationtrail_10_1016_j_amc_2021_126537
crossref_primary_10_1016_j_amc_2021_126537
elsevier_sciencedirect_doi_10_1016_j_amc_2021_126537
PublicationCentury 2000
PublicationDate 2022-01-01
2022-01-00
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-01-01
  day: 01
PublicationDecade 2020
PublicationTitle Applied mathematics and computation
PublicationYear 2022
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References X. Zhang, H. Wang, V. Stojanovic, S. He, X. Luan, F. Liu, Asynchronous fault detection for interval type-2 fuzzy nonhomogeneous higher-level Markov jump systems with uncertain transition probabilities, IEEE Trans. Fuzzy Syst., 10.1109/TFUZZ.2021.3086224
Zohrabi, Momeni, Abolmasoumi (bib0006) 2013; 46
Y. Tu, H. Fang, Y. Yin, S. He, Reinforcement learning-based nonlinear tracking control system design via LDI approach with application to trolley system, Neural Comput. Appl., 10.1007/S00521-021-05909-8
Wang, Huang, Wu, Cao, Shen (bib0024) 2020; 67
Cheng, He, Luan, Liu (bib0013) 2021; 129
Basar, Olsder (bib0030) 1998
W.M. Wonham, Random differential equations in control Theory, 1970.
Lewis, Vrabie (bib0018) 2009; 9
Blair, Sworder (bib0003) 1975; 21
Vrabie, Pastravanu, Abu-Khalaf, Lewis (bib0029) 2009; 45
Krasovskii, Lidskii (bib0001) 1961; 22
Kamalasadan, Swann, Yousefian (bib0023) 2013; 8
Hien, Dzung, Trinh (bib0008) 2016; 175
Yang, Xie (bib0021) 2019; 14
Qin, Gao, Zheng (bib0020) 2014; 26
Vakili, Khorsandi (bib0022) 2012; 7
Qin, Ma, Zheng (bib0019) 2017; 62
Costa, Fragoso, Todorov (bib0027) 2012
Sworder (bib0002) 1969; 14
Qin, Ma, Gao (bib0017) 2017; 47
Z. (bib0007) 2014; 75
Wang, Xia, Shen, Xing, Park (bib0025) 2020
Wang, Hu, Shi, Song, Shen (bib0026) 2020; 357
Zhang, Boukas (bib0005) 2009; 45
Vamvoudakis, Lewis (bib0009) 2011
Jiang, Jiang (bib0011) 2012; 48
Shen, Xing, Wu, Cao, Huang (bib0015) 2021; 32
Shen, Xing, Wu, Xu, Cao (bib0016) 2020; 28
Vrabie, Pastravanu, Khalaf (bib0010) 2009; 45
Mariton (bib0028) 1990
Z. (10.1016/j.amc.2021.126537_bib0007) 2014; 75
Cheng (10.1016/j.amc.2021.126537_bib0013) 2021; 129
Lewis (10.1016/j.amc.2021.126537_bib0018) 2009; 9
Yang (10.1016/j.amc.2021.126537_bib0021) 2019; 14
Vakili (10.1016/j.amc.2021.126537_bib0022) 2012; 7
Costa (10.1016/j.amc.2021.126537_bib0027) 2012
Zhang (10.1016/j.amc.2021.126537_bib0005) 2009; 45
Shen (10.1016/j.amc.2021.126537_bib0016) 2020; 28
Vamvoudakis (10.1016/j.amc.2021.126537_bib0009) 2011
10.1016/j.amc.2021.126537_bib0004
Blair (10.1016/j.amc.2021.126537_bib0003) 1975; 21
Wang (10.1016/j.amc.2021.126537_bib0026) 2020; 357
Qin (10.1016/j.amc.2021.126537_bib0017) 2017; 47
Vrabie (10.1016/j.amc.2021.126537_bib0010) 2009; 45
Kamalasadan (10.1016/j.amc.2021.126537_bib0023) 2013; 8
Basar (10.1016/j.amc.2021.126537_bib0030) 1998
Sworder (10.1016/j.amc.2021.126537_bib0002) 1969; 14
Wang (10.1016/j.amc.2021.126537_bib0025) 2020
Wang (10.1016/j.amc.2021.126537_bib0024) 2020; 67
Krasovskii (10.1016/j.amc.2021.126537_bib0001) 1961; 22
Jiang (10.1016/j.amc.2021.126537_bib0011) 2012; 48
Qin (10.1016/j.amc.2021.126537_bib0019) 2017; 62
Mariton (10.1016/j.amc.2021.126537_bib0028) 1990
Vrabie (10.1016/j.amc.2021.126537_bib0029) 2009; 45
Zohrabi (10.1016/j.amc.2021.126537_bib0006) 2013; 46
Shen (10.1016/j.amc.2021.126537_bib0015) 2021; 32
Qin (10.1016/j.amc.2021.126537_bib0020) 2014; 26
10.1016/j.amc.2021.126537_bib0014
10.1016/j.amc.2021.126537_bib0012
Hien (10.1016/j.amc.2021.126537_bib0008) 2016; 175
References_xml – reference: Y. Tu, H. Fang, Y. Yin, S. He, Reinforcement learning-based nonlinear tracking control system design via LDI approach with application to trolley system, Neural Comput. Appl., 10.1007/S00521-021-05909-8
– volume: 175
  start-page: 450
  year: 2016
  end-page: 458
  ident: bib0008
  article-title: Stochastic stability of nonlinear discrete-time Markovian jump systems with time-varying delay and partially unknown transition rates
  publication-title: Neurocomputing.
– volume: 129
  start-page: 109590
  year: 2021
  ident: bib0013
  article-title: Finite-region asynchronous
  publication-title: Automatica
– volume: 48
  start-page: 2699
  year: 2012
  end-page: 2704
  ident: bib0011
  article-title: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics
  publication-title: Automatica
– volume: 67
  start-page: 5281
  year: 2020
  end-page: 5289
  ident: bib0024
  article-title: Extended dissipative control for singularly perturbed PDT switched systems and its application
  publication-title: IEEE Trans. Circuits I
– volume: 8
  start-page: 1074
  year: 2013
  end-page: 1085
  ident: bib0023
  article-title: A novel system-centric intelligent adaptive control architecture for power system stabilizer based on adaptive neural networks
  publication-title: IEEE Syst. J.
– year: 1990
  ident: bib0028
  article-title: Jump Linear Systems in Automatic Control
– volume: 7
  start-page: 151
  year: 2012
  end-page: 160
  ident: bib0022
  article-title: Self-organized cooperation policy setting in P2P systems based on reinforcement learning
  publication-title: IEEE Syst. J.
– reference: W.M. Wonham, Random differential equations in control Theory, 1970.
– volume: 45
  start-page: 477
  year: 2009
  end-page: 484
  ident: bib0029
  article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration
  publication-title: Automatica
– volume: 14
  start-page: 9
  year: 1969
  end-page: 14
  ident: bib0002
  article-title: Feedback control of a class of linear systems with jump parameters
  publication-title: IEEE Trans. Autom. Control
– volume: 26
  start-page: 510
  year: 2014
  end-page: 521
  ident: bib0020
  article-title: Exponential synchronization of complex networks of linear systems and nonlinear oscillators: a unified analysis
  publication-title: IEEE Trans. Neural Netw. Learn.
– year: 2020
  ident: bib0025
  article-title: synchronization for fuzzy Markov jump chaotic systems with piecewise-constant transition probabilities subject to PDT switching rule
  publication-title: IEEE Trans. Fuzzy Syst.
– year: 2012
  ident: bib0027
  article-title: Continuous-Time Markov Jump Linear Systems
– volume: 45
  start-page: 463
  year: 2009
  end-page: 468
  ident: bib0005
  article-title: Stability and stabilization of Markovian jump linear systems with partly unknown transition probabilities
  publication-title: Automatica
– volume: 32
  start-page: 2002
  year: 2021
  end-page: 2014
  ident: bib0015
  article-title: State estimation for persistent dwell-time switched coupled networks subject to round-robin protocol
  publication-title: IEEE Trans. Neural Netw. Learn.
– volume: 45
  start-page: 477
  year: 2009
  end-page: 484
  ident: bib0010
  article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration
  publication-title: Automatica
– volume: 9
  start-page: 32
  year: 2009
  end-page: 50
  ident: bib0018
  article-title: Reinforcement learning and adaptive dynamic programming for feedback control
  publication-title: IEEE Circuits Syst. Mag.
– volume: 75
  start-page: 101
  year: 2014
  end-page: 111
  ident: bib0007
  article-title: Stability of discrete-time Markovian jump delay systems with delayed impulses and partly unknown transition
  publication-title: Nonlinear Dyn.
– volume: 28
  start-page: 2335
  year: 2020
  end-page: 2347
  ident: bib0016
  article-title: Multi-objective fault-tolerant control for fuzzy switched systems with persistent dwell-time and its application in electric circuits
  publication-title: IEEE Trans. Fuzzy Syst.
– volume: 47
  start-page: 41224133
  year: 2017
  ident: bib0017
  article-title: On group synchronization for interacting clusters of heterogeneous systems
  publication-title: IEEE Trans. Cybern.
– volume: 62
  start-page: 3559
  year: 2017
  end-page: 3566
  ident: bib0019
  article-title: Robust
  publication-title: IEEE Trans. Autom Control
– year: 1998
  ident: bib0030
  article-title: Dynamic Noncooperative Game Theory
– volume: 21
  start-page: 833
  year: 1975
  end-page: 841
  ident: bib0003
  article-title: Feedback control of a class of linear discrete systems with jump parameters and quadratic cost criteria
  publication-title: Int. J. Control
– reference: X. Zhang, H. Wang, V. Stojanovic, S. He, X. Luan, F. Liu, Asynchronous fault detection for interval type-2 fuzzy nonhomogeneous higher-level Markov jump systems with uncertain transition probabilities, IEEE Trans. Fuzzy Syst., 10.1109/TFUZZ.2021.3086224
– year: 2011
  ident: bib0009
  article-title: Non-zero sum games: online learning solution of coupled Hamilton–Jacobi and coupled Riccati equations
  publication-title: 2011 IEEE Int Symp Intell.
– volume: 357
  start-page: 10921
  year: 2020
  end-page: 10936
  ident: bib0026
  article-title: Network-based passive estimation for switched complex dynamical networks under persistent dwell-time with limited signals
  publication-title: J. Frankl. Inst.
– volume: 22
  start-page: 1021
  year: 1961
  end-page: 1025
  ident: bib0001
  article-title: Analytical design of controllers in systems with random attributes
  publication-title: Autom. Remote Control
– volume: 14
  start-page: 51
  year: 2019
  end-page: 60
  ident: bib0021
  article-title: An actor-critic deep reinforcement learning approach for transmission scheduling in cognitive internet of things systems
  publication-title: IEEE Syst. J.
– volume: 46
  start-page: 947
  year: 2013
  end-page: 952
  ident: bib0006
  article-title: Sliding mode control of Markovian jump systems with partly unknown transition probabilities
  publication-title: IFAC Proc. Vol.
– year: 2012
  ident: 10.1016/j.amc.2021.126537_bib0027
– volume: 129
  start-page: 109590
  year: 2021
  ident: 10.1016/j.amc.2021.126537_bib0013
  article-title: Finite-region asynchronous H∞ control for 2D Markov jump systems
  publication-title: Automatica
  doi: 10.1016/j.automatica.2021.109590
– year: 1990
  ident: 10.1016/j.amc.2021.126537_bib0028
– ident: 10.1016/j.amc.2021.126537_bib0014
– ident: 10.1016/j.amc.2021.126537_bib0012
– ident: 10.1016/j.amc.2021.126537_bib0004
– volume: 14
  start-page: 51
  issue: 1
  year: 2019
  ident: 10.1016/j.amc.2021.126537_bib0021
  article-title: An actor-critic deep reinforcement learning approach for transmission scheduling in cognitive internet of things systems
  publication-title: IEEE Syst. J.
  doi: 10.1109/JSYST.2019.2891520
– volume: 48
  start-page: 2699
  issue: 2
  year: 2012
  ident: 10.1016/j.amc.2021.126537_bib0011
  article-title: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics
  publication-title: Automatica
  doi: 10.1016/j.automatica.2012.06.096
– volume: 46
  start-page: 947
  issue: 2
  year: 2013
  ident: 10.1016/j.amc.2021.126537_bib0006
  article-title: Sliding mode control of Markovian jump systems with partly unknown transition probabilities
  publication-title: IFAC Proc. Vol.
  doi: 10.3182/20130204-3-FR-2033.00186
– volume: 22
  start-page: 1021
  issue: 1–3
  year: 1961
  ident: 10.1016/j.amc.2021.126537_bib0001
  article-title: Analytical design of controllers in systems with random attributes
  publication-title: Autom. Remote Control
– volume: 47
  start-page: 41224133
  issue: 12
  year: 2017
  ident: 10.1016/j.amc.2021.126537_bib0017
  article-title: On group synchronization for interacting clusters of heterogeneous systems
  publication-title: IEEE Trans. Cybern.
  doi: 10.1109/TCYB.2016.2600753
– volume: 8
  start-page: 1074
  issue: 4
  year: 2013
  ident: 10.1016/j.amc.2021.126537_bib0023
  article-title: A novel system-centric intelligent adaptive control architecture for power system stabilizer based on adaptive neural networks
  publication-title: IEEE Syst. J.
  doi: 10.1109/JSYST.2013.2265187
– volume: 45
  start-page: 477
  issue: 2
  year: 2009
  ident: 10.1016/j.amc.2021.126537_bib0029
  article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration
  publication-title: Automatica
  doi: 10.1016/j.automatica.2008.08.017
– volume: 21
  start-page: 833
  issue: 5
  year: 1975
  ident: 10.1016/j.amc.2021.126537_bib0003
  article-title: Feedback control of a class of linear discrete systems with jump parameters and quadratic cost criteria
  publication-title: Int. J. Control
  doi: 10.1080/00207177508922037
– volume: 26
  start-page: 510
  issue: 3
  year: 2014
  ident: 10.1016/j.amc.2021.126537_bib0020
  article-title: Exponential synchronization of complex networks of linear systems and nonlinear oscillators: a unified analysis
  publication-title: IEEE Trans. Neural Netw. Learn.
  doi: 10.1109/TNNLS.2014.2316245
– volume: 357
  start-page: 10921
  issue: 15
  year: 2020
  ident: 10.1016/j.amc.2021.126537_bib0026
  article-title: Network-based passive estimation for switched complex dynamical networks under persistent dwell-time with limited signals
  publication-title: J. Frankl. Inst.
  doi: 10.1016/j.jfranklin.2020.08.037
– volume: 9
  start-page: 32
  issue: 3
  year: 2009
  ident: 10.1016/j.amc.2021.126537_bib0018
  article-title: Reinforcement learning and adaptive dynamic programming for feedback control
  publication-title: IEEE Circuits Syst. Mag.
  doi: 10.1109/MCAS.2009.933854
– volume: 62
  start-page: 3559
  issue: 7
  year: 2017
  ident: 10.1016/j.amc.2021.126537_bib0019
  article-title: Robust H∞ group consensus for interacting clusters of integrator agents
  publication-title: IEEE Trans. Autom Control
  doi: 10.1109/TAC.2017.2660240
– volume: 28
  start-page: 2335
  issue: 10
  year: 2020
  ident: 10.1016/j.amc.2021.126537_bib0016
  article-title: Multi-objective fault-tolerant control for fuzzy switched systems with persistent dwell-time and its application in electric circuits
  publication-title: IEEE Trans. Fuzzy Syst.
  doi: 10.1109/TFUZZ.2019.2935685
– volume: 32
  start-page: 2002
  issue: 5
  year: 2021
  ident: 10.1016/j.amc.2021.126537_bib0015
  article-title: State estimation for persistent dwell-time switched coupled networks subject to round-robin protocol
  publication-title: IEEE Trans. Neural Netw. Learn.
  doi: 10.1109/TNNLS.2020.2995708
– year: 2020
  ident: 10.1016/j.amc.2021.126537_bib0025
  article-title: H∞ synchronization for fuzzy Markov jump chaotic systems with piecewise-constant transition probabilities subject to PDT switching rule
  publication-title: IEEE Trans. Fuzzy Syst.
– volume: 45
  start-page: 463
  issue: 2
  year: 2009
  ident: 10.1016/j.amc.2021.126537_bib0005
  article-title: Stability and stabilization of Markovian jump linear systems with partly unknown transition probabilities
  publication-title: Automatica
  doi: 10.1016/j.automatica.2008.08.010
– volume: 14
  start-page: 9
  issue: 1
  year: 1969
  ident: 10.1016/j.amc.2021.126537_bib0002
  article-title: Feedback control of a class of linear systems with jump parameters
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/TAC.1969.1099088
– year: 2011
  ident: 10.1016/j.amc.2021.126537_bib0009
  article-title: Non-zero sum games: online learning solution of coupled Hamilton–Jacobi and coupled Riccati equations
– volume: 45
  start-page: 477
  issue: 2
  year: 2009
  ident: 10.1016/j.amc.2021.126537_bib0010
  article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration
  publication-title: Automatica
  doi: 10.1016/j.automatica.2008.08.017
– volume: 67
  start-page: 5281
  issue: 12
  year: 2020
  ident: 10.1016/j.amc.2021.126537_bib0024
  article-title: Extended dissipative control for singularly perturbed PDT switched systems and its application
  publication-title: IEEE Trans. Circuits I
– volume: 7
  start-page: 151
  issue: 1
  year: 2012
  ident: 10.1016/j.amc.2021.126537_bib0022
  article-title: Self-organized cooperation policy setting in P2P systems based on reinforcement learning
  publication-title: IEEE Syst. J.
  doi: 10.1109/JSYST.2012.2208809
– year: 1998
  ident: 10.1016/j.amc.2021.126537_bib0030
– volume: 75
  start-page: 101
  issue: 1–2
  year: 2014
  ident: 10.1016/j.amc.2021.126537_bib0007
  article-title: Stability of discrete-time Markovian jump delay systems with delayed impulses and partly unknown transition
  publication-title: Nonlinear Dyn.
– volume: 175
  start-page: 450
  year: 2016
  ident: 10.1016/j.amc.2021.126537_bib0008
  article-title: Stochastic stability of nonlinear discrete-time Markovian jump systems with time-varying delay and partially unknown transition rates
  publication-title: Neurocomputing.
  doi: 10.1016/j.neucom.2015.10.081
SSID ssj0007614
Score 2.658135
Snippet •A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 126537
SubjectTerms Coupled algebraic Riccati equations
Markov jump linear systems
Multiplayer non-zero sum games
Reinforcement learning
Title Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems
URI https://dx.doi.org/10.1016/j.amc.2021.126537
Volume 412
WOSCitedRecordID wos000697154500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-5649
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0007614
  issn: 0096-3003
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Li9swEBZht4f2UPqku32gQ081WhTZkqzjUrZsC10K3YJvRpatxSGJQxKHpff-744sy3n0QVvoxRhjWULzWRqNPn2D0GvttssSpgnXSpEkKVOilRmTiglpbVxopkyXbEJeXaVZpj6NRt_CWZjNVM7n6e2tWvxXU8MzMLY7OvsX5h4-Cg_gHowOVzA7XP_I8F48NFpWnSaq6cJ_ITnETSAQavC0I1j5k6_VsomgedGNo8t6krlLHtE27Yq4xPPdaZ5mE03A7pH7sg7yz6tdxzZ4s7NBBnYVjswt2v3t_szLFmT1tN6ygNtuLqjLpp9JfQa3iZ43MJR1ZNypLutZveUS94HuS13vRi4YO4hcDEdq9hifbk1FYkr9qFf5UTmVMeHCa5uGYTvx9OsfpgAfjZic6ZlTqGTjszET3AvLHChrf3Z1uapg2UsFc-nQj5nkCgbH4_P3F9mHYUqXwovEh7aF7fGOKHhQ0c8dnB2n5foBut-vNvC5R8lDNKrmj9C9j1sbPUYTjxe8hxcc8IJ38IIDXjDgBXd4wY3FB3jBHi_Y4QV7vOAeL0_Ql3cX128vSZ-Agxim5JpwqXXJE2EcbYDB7x4zKlJeKlYwnhSUV9YUKbdJLKWV3FpmEkWZptJYOtY0foqOoGnVM4RhLotjYQXVFfhNRaoLk4LrrowoVWqVOEE09FpuenV6lyRlmgca4iSHjs5dR-e-o0_Qm6HIwkuz_O7lJJgi731L7zPmgJtfFzv9t2LP0d0t4F-go_WyrV6iO2azrlfLVz26vgPtbqF8
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Online+reinforcement+learning+multiplayer+non-zero+sum+games+of+continuous-time+Markov+jump+linear+systems&rft.jtitle=Applied+mathematics+and+computation&rft.au=Xin%2C+Xilin&rft.au=Tu%2C+Yidong&rft.au=Stojanovic%2C+Vladimir&rft.au=Wang%2C+Hai&rft.date=2022-01-01&rft.pub=Elsevier+Inc&rft.issn=0096-3003&rft.eissn=1873-5649&rft.volume=412&rft_id=info:doi/10.1016%2Fj.amc.2021.126537&rft.externalDocID=S0096300321006214
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0096-3003&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0096-3003&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0096-3003&client=summon