Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems
•A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to compute the corresponding N coupled algebraic Riccati equations.•The policy iterative algorithm is applied to solve the coupled algebraic Riccati...
Gespeichert in:
| Veröffentlicht in: | Applied mathematics and computation Jg. 412; S. 126537 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Inc
01.01.2022
|
| Schlagworte: | |
| ISSN: | 0096-3003, 1873-5649 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | •A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to compute the corresponding N coupled algebraic Riccati equations.•The policy iterative algorithm is applied to solve the coupled algebraic Riccati equations corresponding to the multiplayer nonzero sum games.
In this paper, a novel online mode-free integral reinforcement learning algorithm is proposed to solve the multiplayer non-zero sum games. We first collect and learn the subsystems information of states and inputs; then we use the online learning to compute the corresponding N coupled algebraic Riccati equations. The policy iterative algorithm proposed in this paper can solve the coupled algebraic Riccati equations corresponding to the multiplayer non-zero sum games. Finally, the effectiveness and feasibility of the design method of this paper is proved by simulation example with three players. |
|---|---|
| AbstractList | •A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to compute the corresponding N coupled algebraic Riccati equations.•The policy iterative algorithm is applied to solve the coupled algebraic Riccati equations corresponding to the multiplayer nonzero sum games.
In this paper, a novel online mode-free integral reinforcement learning algorithm is proposed to solve the multiplayer non-zero sum games. We first collect and learn the subsystems information of states and inputs; then we use the online learning to compute the corresponding N coupled algebraic Riccati equations. The policy iterative algorithm proposed in this paper can solve the coupled algebraic Riccati equations corresponding to the multiplayer non-zero sum games. Finally, the effectiveness and feasibility of the design method of this paper is proved by simulation example with three players. |
| ArticleNumber | 126537 |
| Author | He, Shuping Xin, Xilin Tu, Yidong Wang, Hai Pan, Tianhong Shi, Kaibo Stojanovic, Vladimir |
| Author_xml | – sequence: 1 givenname: Xilin surname: Xin fullname: Xin, Xilin organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China – sequence: 2 givenname: Yidong surname: Tu fullname: Tu, Yidong organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China – sequence: 3 givenname: Vladimir surname: Stojanovic fullname: Stojanovic, Vladimir organization: Department of Automatic Control, Robotics and Fluid Technique, Faculty of Mechanical and Civil Engineering, University of Kragujevac, Kraljevo 36000, Serbia – sequence: 4 givenname: Hai surname: Wang fullname: Wang, Hai organization: Discipline of Engineering and Energy, Murdoch University, 90 South Street, Murdoch, WA 6150, Australia – sequence: 5 givenname: Kaibo surname: Shi fullname: Shi, Kaibo organization: School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu 610106, China – sequence: 6 givenname: Shuping surname: He fullname: He, Shuping email: shuping.he@ahu.edu.cn organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China – sequence: 7 givenname: Tianhong surname: Pan fullname: Pan, Tianhong email: hpan@ahu.edu.cn organization: Key Laboratory of Intelligent Computing and Signal Processing (Ministry of Education), School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China |
| BookMark | eNp9kMtOwzAQRS0EEuXxAez8AyljJ44bsUIVL6mIDawt1x1XDrFd2Q5S-XpSlRULVnd17sw9F-Q0xICE3DCYM2DtbT_X3sw5cDZnvBW1PCEztpB1JdqmOyUzgK6taoD6nFzk3AOAbFkzI_1bGFxAmtAFG5NBj6HQAXUKLmypH4fidoPeY6LTxeobU6R59HSrPWYaLTUxFBfGOOaqOI_0VafP-EX70e_ooVknmve5oM9X5MzqIeP1b16Sj8eH9-VztXp7elneryrDO1kqIbXeiKY1DWec4xprDu1CbDq-5qJZg0Br1gthm1pKK4W13DQdcA3SWGAa6ksij70mxZwTWmVc0cVNjybtBsVAHZSpXk3K1EGZOiqbSPaH3CXnddr_y9wdGZwmfTlMKhuHweDGJTRFbaL7h_4BgMKIZA |
| CitedBy_id | crossref_primary_10_1109_TITS_2025_3546612 crossref_primary_10_1109_TASE_2023_3279829 crossref_primary_10_1016_j_automatica_2023_111101 crossref_primary_10_1109_ACCESS_2022_3198968 crossref_primary_10_1109_TFUZZ_2023_3265666 crossref_primary_10_1109_TASE_2023_3299275 crossref_primary_10_1016_j_engappai_2023_105959 crossref_primary_10_1016_j_neunet_2023_06_015 crossref_primary_10_1016_j_neunet_2023_02_045 crossref_primary_10_1038_s41598_024_56497_1 crossref_primary_10_1016_j_neunet_2022_11_008 crossref_primary_10_1109_TASE_2023_3234961 crossref_primary_10_1109_TSP_2021_3130967 crossref_primary_10_1109_TSP_2022_3176109 crossref_primary_10_1016_j_neunet_2022_05_017 crossref_primary_10_1016_j_automatica_2025_112591 crossref_primary_10_1016_j_engappai_2023_106050 crossref_primary_10_1016_j_measurement_2022_112356 crossref_primary_10_1109_TCSI_2024_3387914 crossref_primary_10_1016_j_neunet_2023_02_033 crossref_primary_10_1109_TII_2024_3465601 crossref_primary_10_1002_mma_10143 crossref_primary_10_1007_s11063_023_11209_0 crossref_primary_10_1016_j_ins_2022_10_022 crossref_primary_10_1109_TSMC_2024_3405023 crossref_primary_10_1109_TASE_2024_3506592 crossref_primary_10_1109_TNNLS_2024_3487760 crossref_primary_10_1007_s00521_021_06652_w crossref_primary_10_1007_s12555_021_0675_y crossref_primary_10_1016_j_neunet_2022_05_002 crossref_primary_10_1109_TSMC_2024_3462762 crossref_primary_10_1109_TCSII_2022_3233790 crossref_primary_10_1007_s40815_022_01291_2 crossref_primary_10_1109_TSP_2023_3274937 crossref_primary_10_1016_j_automatica_2024_111886 crossref_primary_10_1109_JAS_2023_123960 crossref_primary_10_1088_1361_6501_ad22cc crossref_primary_10_1109_TCYB_2025_3538787 crossref_primary_10_1109_TAI_2024_3415550 crossref_primary_10_1016_j_conengprac_2022_105257 crossref_primary_10_1016_j_eswa_2023_119774 crossref_primary_10_1016_j_neunet_2022_06_032 crossref_primary_10_1155_2022_3205960 crossref_primary_10_1016_j_neunet_2023_07_009 crossref_primary_10_1109_ACCESS_2023_3313411 crossref_primary_10_1007_s10846_022_01702_4 crossref_primary_10_1109_TCSII_2022_3199246 crossref_primary_10_1007_s13042_023_01845_2 crossref_primary_10_1002_rnc_7501 crossref_primary_10_1007_s40747_023_01068_6 crossref_primary_10_1007_s13042_022_01614_7 crossref_primary_10_1007_s10489_023_05050_0 crossref_primary_10_1109_TNNLS_2022_3186229 crossref_primary_10_1007_s11424_025_4535_3 crossref_primary_10_1109_TSMC_2022_3189771 crossref_primary_10_1109_JSYST_2025_3533880 crossref_primary_10_3390_s23218740 crossref_primary_10_1007_s00170_022_09535_z crossref_primary_10_1080_00051144_2023_2203552 crossref_primary_10_1016_j_ins_2022_12_073 crossref_primary_10_1016_j_amc_2024_128803 crossref_primary_10_1016_j_ins_2022_09_029 crossref_primary_10_1007_s11063_022_11109_9 crossref_primary_10_1007_s40747_023_00995_8 crossref_primary_10_1109_TCYB_2022_3220537 |
| Cites_doi | 10.1016/j.automatica.2021.109590 10.1109/JSYST.2019.2891520 10.1016/j.automatica.2012.06.096 10.3182/20130204-3-FR-2033.00186 10.1109/TCYB.2016.2600753 10.1109/JSYST.2013.2265187 10.1016/j.automatica.2008.08.017 10.1080/00207177508922037 10.1109/TNNLS.2014.2316245 10.1016/j.jfranklin.2020.08.037 10.1109/MCAS.2009.933854 10.1109/TAC.2017.2660240 10.1109/TFUZZ.2019.2935685 10.1109/TNNLS.2020.2995708 10.1016/j.automatica.2008.08.010 10.1109/TAC.1969.1099088 10.1109/JSYST.2012.2208809 10.1016/j.neucom.2015.10.081 |
| ContentType | Journal Article |
| Copyright | 2021 |
| Copyright_xml | – notice: 2021 |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.amc.2021.126537 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics |
| EISSN | 1873-5649 |
| ExternalDocumentID | 10_1016_j_amc_2021_126537 S0096300321006214 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 23M 4.4 457 4G. 5GY 6J9 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAXUO ABAOU ABFNM ABFRF ABJNI ABMAC ABYKQ ACAZW ACDAQ ACGFO ACGFS ACRLP ADBBV ADEZE ADGUI AEBSH AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AIEXJ AIGVJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ ARUGR AXJTR BKOJK BLXMC CS3 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA IHE J1W KOM LG9 M26 M41 MHUIS MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 RNS ROL RPZ RXW SBC SDF SDG SES SME SPC SPCBC SSW SSZ T5K TN5 WH7 X6Y XPP ZMT ~02 ~G- 5VS 9DU AAQFI AAQXK AATTM AAXKI AAYWO AAYXX ABEFU ABWVN ABXDB ACLOT ACRPL ACVFH ADCNI ADIYS ADMUD ADNMO AEIPS AEUPX AFFNX AFJKZ AFPUW AGQPQ AI. AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB G-2 HLZ HMJ HVGLF HZ~ R2- SEW TAE VH1 VOH WUQ ~HD |
| ID | FETCH-LOGICAL-c297t-57aad546c42122ebe320685d92b254b05efcb85f4377f75ff2c4902a07cf01a03 |
| ISICitedReferencesCount | 150 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000697154500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0096-3003 |
| IngestDate | Tue Nov 18 22:29:45 EST 2025 Sat Nov 29 07:24:38 EST 2025 Fri Feb 23 02:43:49 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Coupled algebraic Riccati equations Markov jump linear systems Multiplayer non-zero sum games Reinforcement learning |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c297t-57aad546c42122ebe320685d92b254b05efcb85f4377f75ff2c4902a07cf01a03 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_amc_2021_126537 crossref_primary_10_1016_j_amc_2021_126537 elsevier_sciencedirect_doi_10_1016_j_amc_2021_126537 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-01-01 2022-01-00 |
| PublicationDateYYYYMMDD | 2022-01-01 |
| PublicationDate_xml | – month: 01 year: 2022 text: 2022-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Applied mathematics and computation |
| PublicationYear | 2022 |
| Publisher | Elsevier Inc |
| Publisher_xml | – name: Elsevier Inc |
| References | X. Zhang, H. Wang, V. Stojanovic, S. He, X. Luan, F. Liu, Asynchronous fault detection for interval type-2 fuzzy nonhomogeneous higher-level Markov jump systems with uncertain transition probabilities, IEEE Trans. Fuzzy Syst., 10.1109/TFUZZ.2021.3086224 Zohrabi, Momeni, Abolmasoumi (bib0006) 2013; 46 Y. Tu, H. Fang, Y. Yin, S. He, Reinforcement learning-based nonlinear tracking control system design via LDI approach with application to trolley system, Neural Comput. Appl., 10.1007/S00521-021-05909-8 Wang, Huang, Wu, Cao, Shen (bib0024) 2020; 67 Cheng, He, Luan, Liu (bib0013) 2021; 129 Basar, Olsder (bib0030) 1998 W.M. Wonham, Random differential equations in control Theory, 1970. Lewis, Vrabie (bib0018) 2009; 9 Blair, Sworder (bib0003) 1975; 21 Vrabie, Pastravanu, Abu-Khalaf, Lewis (bib0029) 2009; 45 Krasovskii, Lidskii (bib0001) 1961; 22 Kamalasadan, Swann, Yousefian (bib0023) 2013; 8 Hien, Dzung, Trinh (bib0008) 2016; 175 Yang, Xie (bib0021) 2019; 14 Qin, Gao, Zheng (bib0020) 2014; 26 Vakili, Khorsandi (bib0022) 2012; 7 Qin, Ma, Zheng (bib0019) 2017; 62 Costa, Fragoso, Todorov (bib0027) 2012 Sworder (bib0002) 1969; 14 Qin, Ma, Gao (bib0017) 2017; 47 Z. (bib0007) 2014; 75 Wang, Xia, Shen, Xing, Park (bib0025) 2020 Wang, Hu, Shi, Song, Shen (bib0026) 2020; 357 Zhang, Boukas (bib0005) 2009; 45 Vamvoudakis, Lewis (bib0009) 2011 Jiang, Jiang (bib0011) 2012; 48 Shen, Xing, Wu, Cao, Huang (bib0015) 2021; 32 Shen, Xing, Wu, Xu, Cao (bib0016) 2020; 28 Vrabie, Pastravanu, Khalaf (bib0010) 2009; 45 Mariton (bib0028) 1990 Z. (10.1016/j.amc.2021.126537_bib0007) 2014; 75 Cheng (10.1016/j.amc.2021.126537_bib0013) 2021; 129 Lewis (10.1016/j.amc.2021.126537_bib0018) 2009; 9 Yang (10.1016/j.amc.2021.126537_bib0021) 2019; 14 Vakili (10.1016/j.amc.2021.126537_bib0022) 2012; 7 Costa (10.1016/j.amc.2021.126537_bib0027) 2012 Zhang (10.1016/j.amc.2021.126537_bib0005) 2009; 45 Shen (10.1016/j.amc.2021.126537_bib0016) 2020; 28 Vamvoudakis (10.1016/j.amc.2021.126537_bib0009) 2011 10.1016/j.amc.2021.126537_bib0004 Blair (10.1016/j.amc.2021.126537_bib0003) 1975; 21 Wang (10.1016/j.amc.2021.126537_bib0026) 2020; 357 Qin (10.1016/j.amc.2021.126537_bib0017) 2017; 47 Vrabie (10.1016/j.amc.2021.126537_bib0010) 2009; 45 Kamalasadan (10.1016/j.amc.2021.126537_bib0023) 2013; 8 Basar (10.1016/j.amc.2021.126537_bib0030) 1998 Sworder (10.1016/j.amc.2021.126537_bib0002) 1969; 14 Wang (10.1016/j.amc.2021.126537_bib0025) 2020 Wang (10.1016/j.amc.2021.126537_bib0024) 2020; 67 Krasovskii (10.1016/j.amc.2021.126537_bib0001) 1961; 22 Jiang (10.1016/j.amc.2021.126537_bib0011) 2012; 48 Qin (10.1016/j.amc.2021.126537_bib0019) 2017; 62 Mariton (10.1016/j.amc.2021.126537_bib0028) 1990 Vrabie (10.1016/j.amc.2021.126537_bib0029) 2009; 45 Zohrabi (10.1016/j.amc.2021.126537_bib0006) 2013; 46 Shen (10.1016/j.amc.2021.126537_bib0015) 2021; 32 Qin (10.1016/j.amc.2021.126537_bib0020) 2014; 26 10.1016/j.amc.2021.126537_bib0014 10.1016/j.amc.2021.126537_bib0012 Hien (10.1016/j.amc.2021.126537_bib0008) 2016; 175 |
| References_xml | – reference: Y. Tu, H. Fang, Y. Yin, S. He, Reinforcement learning-based nonlinear tracking control system design via LDI approach with application to trolley system, Neural Comput. Appl., 10.1007/S00521-021-05909-8 – volume: 175 start-page: 450 year: 2016 end-page: 458 ident: bib0008 article-title: Stochastic stability of nonlinear discrete-time Markovian jump systems with time-varying delay and partially unknown transition rates publication-title: Neurocomputing. – volume: 129 start-page: 109590 year: 2021 ident: bib0013 article-title: Finite-region asynchronous publication-title: Automatica – volume: 48 start-page: 2699 year: 2012 end-page: 2704 ident: bib0011 article-title: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics publication-title: Automatica – volume: 67 start-page: 5281 year: 2020 end-page: 5289 ident: bib0024 article-title: Extended dissipative control for singularly perturbed PDT switched systems and its application publication-title: IEEE Trans. Circuits I – volume: 8 start-page: 1074 year: 2013 end-page: 1085 ident: bib0023 article-title: A novel system-centric intelligent adaptive control architecture for power system stabilizer based on adaptive neural networks publication-title: IEEE Syst. J. – year: 1990 ident: bib0028 article-title: Jump Linear Systems in Automatic Control – volume: 7 start-page: 151 year: 2012 end-page: 160 ident: bib0022 article-title: Self-organized cooperation policy setting in P2P systems based on reinforcement learning publication-title: IEEE Syst. J. – reference: W.M. Wonham, Random differential equations in control Theory, 1970. – volume: 45 start-page: 477 year: 2009 end-page: 484 ident: bib0029 article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration publication-title: Automatica – volume: 14 start-page: 9 year: 1969 end-page: 14 ident: bib0002 article-title: Feedback control of a class of linear systems with jump parameters publication-title: IEEE Trans. Autom. Control – volume: 26 start-page: 510 year: 2014 end-page: 521 ident: bib0020 article-title: Exponential synchronization of complex networks of linear systems and nonlinear oscillators: a unified analysis publication-title: IEEE Trans. Neural Netw. Learn. – year: 2020 ident: bib0025 article-title: synchronization for fuzzy Markov jump chaotic systems with piecewise-constant transition probabilities subject to PDT switching rule publication-title: IEEE Trans. Fuzzy Syst. – year: 2012 ident: bib0027 article-title: Continuous-Time Markov Jump Linear Systems – volume: 45 start-page: 463 year: 2009 end-page: 468 ident: bib0005 article-title: Stability and stabilization of Markovian jump linear systems with partly unknown transition probabilities publication-title: Automatica – volume: 32 start-page: 2002 year: 2021 end-page: 2014 ident: bib0015 article-title: State estimation for persistent dwell-time switched coupled networks subject to round-robin protocol publication-title: IEEE Trans. Neural Netw. Learn. – volume: 45 start-page: 477 year: 2009 end-page: 484 ident: bib0010 article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration publication-title: Automatica – volume: 9 start-page: 32 year: 2009 end-page: 50 ident: bib0018 article-title: Reinforcement learning and adaptive dynamic programming for feedback control publication-title: IEEE Circuits Syst. Mag. – volume: 75 start-page: 101 year: 2014 end-page: 111 ident: bib0007 article-title: Stability of discrete-time Markovian jump delay systems with delayed impulses and partly unknown transition publication-title: Nonlinear Dyn. – volume: 28 start-page: 2335 year: 2020 end-page: 2347 ident: bib0016 article-title: Multi-objective fault-tolerant control for fuzzy switched systems with persistent dwell-time and its application in electric circuits publication-title: IEEE Trans. Fuzzy Syst. – volume: 47 start-page: 41224133 year: 2017 ident: bib0017 article-title: On group synchronization for interacting clusters of heterogeneous systems publication-title: IEEE Trans. Cybern. – volume: 62 start-page: 3559 year: 2017 end-page: 3566 ident: bib0019 article-title: Robust publication-title: IEEE Trans. Autom Control – year: 1998 ident: bib0030 article-title: Dynamic Noncooperative Game Theory – volume: 21 start-page: 833 year: 1975 end-page: 841 ident: bib0003 article-title: Feedback control of a class of linear discrete systems with jump parameters and quadratic cost criteria publication-title: Int. J. Control – reference: X. Zhang, H. Wang, V. Stojanovic, S. He, X. Luan, F. Liu, Asynchronous fault detection for interval type-2 fuzzy nonhomogeneous higher-level Markov jump systems with uncertain transition probabilities, IEEE Trans. Fuzzy Syst., 10.1109/TFUZZ.2021.3086224 – year: 2011 ident: bib0009 article-title: Non-zero sum games: online learning solution of coupled Hamilton–Jacobi and coupled Riccati equations publication-title: 2011 IEEE Int Symp Intell. – volume: 357 start-page: 10921 year: 2020 end-page: 10936 ident: bib0026 article-title: Network-based passive estimation for switched complex dynamical networks under persistent dwell-time with limited signals publication-title: J. Frankl. Inst. – volume: 22 start-page: 1021 year: 1961 end-page: 1025 ident: bib0001 article-title: Analytical design of controllers in systems with random attributes publication-title: Autom. Remote Control – volume: 14 start-page: 51 year: 2019 end-page: 60 ident: bib0021 article-title: An actor-critic deep reinforcement learning approach for transmission scheduling in cognitive internet of things systems publication-title: IEEE Syst. J. – volume: 46 start-page: 947 year: 2013 end-page: 952 ident: bib0006 article-title: Sliding mode control of Markovian jump systems with partly unknown transition probabilities publication-title: IFAC Proc. Vol. – year: 2012 ident: 10.1016/j.amc.2021.126537_bib0027 – volume: 129 start-page: 109590 year: 2021 ident: 10.1016/j.amc.2021.126537_bib0013 article-title: Finite-region asynchronous H∞ control for 2D Markov jump systems publication-title: Automatica doi: 10.1016/j.automatica.2021.109590 – year: 1990 ident: 10.1016/j.amc.2021.126537_bib0028 – ident: 10.1016/j.amc.2021.126537_bib0014 – ident: 10.1016/j.amc.2021.126537_bib0012 – ident: 10.1016/j.amc.2021.126537_bib0004 – volume: 14 start-page: 51 issue: 1 year: 2019 ident: 10.1016/j.amc.2021.126537_bib0021 article-title: An actor-critic deep reinforcement learning approach for transmission scheduling in cognitive internet of things systems publication-title: IEEE Syst. J. doi: 10.1109/JSYST.2019.2891520 – volume: 48 start-page: 2699 issue: 2 year: 2012 ident: 10.1016/j.amc.2021.126537_bib0011 article-title: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics publication-title: Automatica doi: 10.1016/j.automatica.2012.06.096 – volume: 46 start-page: 947 issue: 2 year: 2013 ident: 10.1016/j.amc.2021.126537_bib0006 article-title: Sliding mode control of Markovian jump systems with partly unknown transition probabilities publication-title: IFAC Proc. Vol. doi: 10.3182/20130204-3-FR-2033.00186 – volume: 22 start-page: 1021 issue: 1–3 year: 1961 ident: 10.1016/j.amc.2021.126537_bib0001 article-title: Analytical design of controllers in systems with random attributes publication-title: Autom. Remote Control – volume: 47 start-page: 41224133 issue: 12 year: 2017 ident: 10.1016/j.amc.2021.126537_bib0017 article-title: On group synchronization for interacting clusters of heterogeneous systems publication-title: IEEE Trans. Cybern. doi: 10.1109/TCYB.2016.2600753 – volume: 8 start-page: 1074 issue: 4 year: 2013 ident: 10.1016/j.amc.2021.126537_bib0023 article-title: A novel system-centric intelligent adaptive control architecture for power system stabilizer based on adaptive neural networks publication-title: IEEE Syst. J. doi: 10.1109/JSYST.2013.2265187 – volume: 45 start-page: 477 issue: 2 year: 2009 ident: 10.1016/j.amc.2021.126537_bib0029 article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration publication-title: Automatica doi: 10.1016/j.automatica.2008.08.017 – volume: 21 start-page: 833 issue: 5 year: 1975 ident: 10.1016/j.amc.2021.126537_bib0003 article-title: Feedback control of a class of linear discrete systems with jump parameters and quadratic cost criteria publication-title: Int. J. Control doi: 10.1080/00207177508922037 – volume: 26 start-page: 510 issue: 3 year: 2014 ident: 10.1016/j.amc.2021.126537_bib0020 article-title: Exponential synchronization of complex networks of linear systems and nonlinear oscillators: a unified analysis publication-title: IEEE Trans. Neural Netw. Learn. doi: 10.1109/TNNLS.2014.2316245 – volume: 357 start-page: 10921 issue: 15 year: 2020 ident: 10.1016/j.amc.2021.126537_bib0026 article-title: Network-based passive estimation for switched complex dynamical networks under persistent dwell-time with limited signals publication-title: J. Frankl. Inst. doi: 10.1016/j.jfranklin.2020.08.037 – volume: 9 start-page: 32 issue: 3 year: 2009 ident: 10.1016/j.amc.2021.126537_bib0018 article-title: Reinforcement learning and adaptive dynamic programming for feedback control publication-title: IEEE Circuits Syst. Mag. doi: 10.1109/MCAS.2009.933854 – volume: 62 start-page: 3559 issue: 7 year: 2017 ident: 10.1016/j.amc.2021.126537_bib0019 article-title: Robust H∞ group consensus for interacting clusters of integrator agents publication-title: IEEE Trans. Autom Control doi: 10.1109/TAC.2017.2660240 – volume: 28 start-page: 2335 issue: 10 year: 2020 ident: 10.1016/j.amc.2021.126537_bib0016 article-title: Multi-objective fault-tolerant control for fuzzy switched systems with persistent dwell-time and its application in electric circuits publication-title: IEEE Trans. Fuzzy Syst. doi: 10.1109/TFUZZ.2019.2935685 – volume: 32 start-page: 2002 issue: 5 year: 2021 ident: 10.1016/j.amc.2021.126537_bib0015 article-title: State estimation for persistent dwell-time switched coupled networks subject to round-robin protocol publication-title: IEEE Trans. Neural Netw. Learn. doi: 10.1109/TNNLS.2020.2995708 – year: 2020 ident: 10.1016/j.amc.2021.126537_bib0025 article-title: H∞ synchronization for fuzzy Markov jump chaotic systems with piecewise-constant transition probabilities subject to PDT switching rule publication-title: IEEE Trans. Fuzzy Syst. – volume: 45 start-page: 463 issue: 2 year: 2009 ident: 10.1016/j.amc.2021.126537_bib0005 article-title: Stability and stabilization of Markovian jump linear systems with partly unknown transition probabilities publication-title: Automatica doi: 10.1016/j.automatica.2008.08.010 – volume: 14 start-page: 9 issue: 1 year: 1969 ident: 10.1016/j.amc.2021.126537_bib0002 article-title: Feedback control of a class of linear systems with jump parameters publication-title: IEEE Trans. Autom. Control doi: 10.1109/TAC.1969.1099088 – year: 2011 ident: 10.1016/j.amc.2021.126537_bib0009 article-title: Non-zero sum games: online learning solution of coupled Hamilton–Jacobi and coupled Riccati equations – volume: 45 start-page: 477 issue: 2 year: 2009 ident: 10.1016/j.amc.2021.126537_bib0010 article-title: Adaptive optimal control for continuous-time linear systems based on policy iteration publication-title: Automatica doi: 10.1016/j.automatica.2008.08.017 – volume: 67 start-page: 5281 issue: 12 year: 2020 ident: 10.1016/j.amc.2021.126537_bib0024 article-title: Extended dissipative control for singularly perturbed PDT switched systems and its application publication-title: IEEE Trans. Circuits I – volume: 7 start-page: 151 issue: 1 year: 2012 ident: 10.1016/j.amc.2021.126537_bib0022 article-title: Self-organized cooperation policy setting in P2P systems based on reinforcement learning publication-title: IEEE Syst. J. doi: 10.1109/JSYST.2012.2208809 – year: 1998 ident: 10.1016/j.amc.2021.126537_bib0030 – volume: 75 start-page: 101 issue: 1–2 year: 2014 ident: 10.1016/j.amc.2021.126537_bib0007 article-title: Stability of discrete-time Markovian jump delay systems with delayed impulses and partly unknown transition publication-title: Nonlinear Dyn. – volume: 175 start-page: 450 year: 2016 ident: 10.1016/j.amc.2021.126537_bib0008 article-title: Stochastic stability of nonlinear discrete-time Markovian jump systems with time-varying delay and partially unknown transition rates publication-title: Neurocomputing. doi: 10.1016/j.neucom.2015.10.081 |
| SSID | ssj0007614 |
| Score | 2.658135 |
| Snippet | •A novel online mode-free integral reinforcement learning algorithm is proposed to solve the mutiplayer non-zero sum games.•The online learning is used to... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 126537 |
| SubjectTerms | Coupled algebraic Riccati equations Markov jump linear systems Multiplayer non-zero sum games Reinforcement learning |
| Title | Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems |
| URI | https://dx.doi.org/10.1016/j.amc.2021.126537 |
| Volume | 412 |
| WOSCitedRecordID | wos000697154500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1873-5649 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0007614 issn: 0096-3003 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Ja9wwFBZD0kN7KF1p0gUdeqpRkGXLso-hpKSFhkJT8M3IshRsZsbDzHgIvfd_98myPEsX2kIvxhhrQe-z9J786XsIvdYalj1FOdFVBAGKTgzJTMUJxGGlAX8j1ZXqk02Iq6s0z7NPk8k3fxZmMxXzeXp7my3-q6nhGRjbHp39C3OPlcIDuAejwxXMDtc_MrwTDw2WutdEVf32n08OceMJhBI87QAif_JVL9sAuhfcWLqsI5nb5BFd262ITTzfn-ZpN0EDdg9szdLLP692HVvvzc5GGdiVPzK36PZ_9-dOtiCvp_WWBdz1a0FdtcNK6jK4NXLewlTWk3Gnsqpn9ZZLPGx0X8p6d-eCsYOdi_FIzR7j08ZUJKLUzXrazcqpiAhPnLapn7ZjR7_-YQlwuxHNmZxZhUoWnoUs4U5Y5kBZ-7NtyzYFYS9NmE2HfswEz2ByPD5_f5F_GJd0kTiReN83_3u8JwoeNPRzB2fHabl-gO4P0QY-dyh5iCZ6_gjd-7i10WPUOLzgPbxgjxe8gxfs8YIBL7jHC24NPsALdnjBFi_Y4QUPeHmCvry7uH57SYYEHESxTKwJF1JWPE6UpQ0w-NwjRpOUVxkrGY9LyrVRZcpNHAlhBDeGqTijTFKhDA0ljZ6iI-iafoZwFYaaZZXkkdZQmypFCWNlxZF0GaclO0HUj1qhBnV6myRlWngaYlPAQBd2oAs30CfozVhk4aRZfvdy7E1RDL6l8xkLwM2vi53-W7Hn6O4W8C_Q0XrZ6Zfojtqs69Xy1YCu77DMojQ |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Online+reinforcement+learning+multiplayer+non-zero+sum+games+of+continuous-time+Markov+jump+linear+systems&rft.jtitle=Applied+mathematics+and+computation&rft.au=Xin%2C+Xilin&rft.au=Tu%2C+Yidong&rft.au=Stojanovic%2C+Vladimir&rft.au=Wang%2C+Hai&rft.date=2022-01-01&rft.pub=Elsevier+Inc&rft.issn=0096-3003&rft.eissn=1873-5649&rft.volume=412&rft_id=info:doi/10.1016%2Fj.amc.2021.126537&rft.externalDocID=S0096300321006214 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0096-3003&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0096-3003&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0096-3003&client=summon |