Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm
In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate...
Uloženo v:
| Vydáno v: | AIMS mathematics Ročník 8; číslo 5; s. 10249 - 10265 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
AIMS Press
01.01.2023
|
| Témata: | |
| ISSN: | 2473-6988, 2473-6988 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm. |
|---|---|
| AbstractList | In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm. |
| Author | Li, Yuan Tan, Xufeng Liu, Yang |
| Author_xml | – sequence: 1 givenname: Xufeng surname: Tan fullname: Tan, Xufeng organization: School of Science, Shenyang University of Technology, Shenyang 110870, China – sequence: 2 givenname: Yuan surname: Li fullname: Li, Yuan organization: School of Science, Shenyang University of Technology, Shenyang 110870, China – sequence: 3 givenname: Yang surname: Liu fullname: Liu, Yang organization: School of Electrical and Electronic Engineering, Shenyang University of Technology, Shenyang 110870, China |
| BookMark | eNpNkV1LwzAUhoNMcM7d-QPyA-xMkzRpL2X4MRiIqNflLDnZOrtmJhHdv7fVIV6dw8vLAy_PORl1vkNCLnM2E5WQ1ztImxlnXBR5dULGXGqRqaosR__-MzKNccsY4zmXXMsx-XpO3mwgpsbQtukQAn3_ABtgCPw-NTtoaQpg3ppuTY3vUvAtdT5Q20QTMGHWd5DGQ0y4i_SzSRtqsYVDpCuIaKnv6FPW9uBuIEC79qHv7C7IqYM24vR4J-T17vZl_pAtH-8X85tlBkJVKTOgJJcrUWpTgUJtpSkQuVKusJYJqwvFjWNspW1lnEZRVJJV2gjkTgOTYkIWv1zrYVvvQz8oHGoPTf0T-LCuIfRjW6wZyFLZEphjQuYrXarSOXSWl5ZZKEzPuvplmeBjDOj-eDmrBwn1IKE-ShDfsuh_gg |
| Cites_doi | 10.1016/j.neucom.2015.06.053 10.1016/j.jfranklin.2017.11.029 10.1109/TAC.2010.2095550 10.1109/TNN.2009.2027233 10.1109/TASE.2013.2284545 10.1109/TAC.1987.1104710 10.1109/TCST.2013.2293401 10.1016/j.neucom.2018.07.098 10.1007/s00521-011-0711-6 10.1109/TNN.2007.912319 10.1016/0024-3795(87)90222-9 10.1016/j.neucom.2017.12.007 10.1007/s12555-016-0711-5 10.1098/rspb.2019.2454 10.1109/TCYB.2014.2384016 10.1016/j.automatica.2008.08.017 10.1109/MCAS.2009.933854 10.1016/j.automatica.2014.02.015 10.1109/TSMCB.2008.920269 10.1016/j.jfranklin.2016.03.012 10.1007/s11768-021-00046-y 10.1016/j.neucom.2018.04.018 10.1109/9.863597 10.1049/el.2017.3238 10.1016/j.automatica.2014.05.011 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.3934/math.2023519 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics |
| EISSN | 2473-6988 |
| EndPage | 10265 |
| ExternalDocumentID | oai_doaj_org_article_0a486d8a0f0341b7868ffefd28d0da5c 10_3934_math_2023519 |
| GroupedDBID | AAYXX ADBBV ALMA_UNASSIGNED_HOLDINGS AMVHM BCNDV CITATION EBS FRJ GROUPED_DOAJ IAO ITC M~E OK1 RAN |
| ID | FETCH-LOGICAL-a369t-ca6424b387c9a6e7d4c5ee266f5dd03d7562cf00b7d9cf7e3594097c3e2f7a043 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000938816100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2473-6988 |
| IngestDate | Fri Oct 03 12:40:18 EDT 2025 Sat Nov 29 06:04:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a369t-ca6424b387c9a6e7d4c5ee266f5dd03d7562cf00b7d9cf7e3594097c3e2f7a043 |
| OpenAccessLink | https://doaj.org/article/0a486d8a0f0341b7868ffefd28d0da5c |
| PageCount | 17 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_0a486d8a0f0341b7868ffefd28d0da5c crossref_primary_10_3934_math_2023519 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-01-01 |
| PublicationDateYYYYMMDD | 2023-01-01 |
| PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | AIMS mathematics |
| PublicationYear | 2023 |
| Publisher | AIMS Press |
| Publisher_xml | – name: AIMS Press |
| References | key-10.3934/math.2023519-4 key-10.3934/math.2023519-5 key-10.3934/math.2023519-2 key-10.3934/math.2023519-3 key-10.3934/math.2023519-10 key-10.3934/math.2023519-1 key-10.3934/math.2023519-12 key-10.3934/math.2023519-11 key-10.3934/math.2023519-8 key-10.3934/math.2023519-9 key-10.3934/math.2023519-6 key-10.3934/math.2023519-7 key-10.3934/math.2023519-14 key-10.3934/math.2023519-13 key-10.3934/math.2023519-16 key-10.3934/math.2023519-15 key-10.3934/math.2023519-18 key-10.3934/math.2023519-17 key-10.3934/math.2023519-19 key-10.3934/math.2023519-21 key-10.3934/math.2023519-20 key-10.3934/math.2023519-23 key-10.3934/math.2023519-22 key-10.3934/math.2023519-25 key-10.3934/math.2023519-24 key-10.3934/math.2023519-27 key-10.3934/math.2023519-26 key-10.3934/math.2023519-28 |
| References_xml | – ident: key-10.3934/math.2023519-21 doi: 10.1016/j.neucom.2015.06.053 – ident: key-10.3934/math.2023519-18 doi: 10.1016/j.jfranklin.2017.11.029 – ident: key-10.3934/math.2023519-22 doi: 10.1109/TAC.2010.2095550 – ident: key-10.3934/math.2023519-19 doi: 10.1109/TNN.2009.2027233 – ident: key-10.3934/math.2023519-26 doi: 10.1109/TASE.2013.2284545 – ident: key-10.3934/math.2023519-5 doi: 10.1109/TAC.1987.1104710 – ident: key-10.3934/math.2023519-17 doi: 10.1109/TCST.2013.2293401 – ident: key-10.3934/math.2023519-25 doi: 10.1016/j.neucom.2018.07.098 – ident: key-10.3934/math.2023519-3 doi: 10.1007/s00521-011-0711-6 – ident: key-10.3934/math.2023519-20 doi: 10.1109/TNN.2007.912319 – ident: key-10.3934/math.2023519-12 – ident: key-10.3934/math.2023519-7 doi: 10.1016/0024-3795(87)90222-9 – ident: key-10.3934/math.2023519-13 doi: 10.1016/j.neucom.2017.12.007 – ident: key-10.3934/math.2023519-2 doi: 10.1007/s12555-016-0711-5 – ident: key-10.3934/math.2023519-4 doi: 10.1098/rspb.2019.2454 – ident: key-10.3934/math.2023519-9 doi: 10.1109/TCYB.2014.2384016 – ident: key-10.3934/math.2023519-8 doi: 10.1016/j.automatica.2008.08.017 – ident: key-10.3934/math.2023519-28 doi: 10.1109/MCAS.2009.933854 – ident: key-10.3934/math.2023519-10 doi: 10.1016/j.automatica.2014.02.015 – ident: key-10.3934/math.2023519-24 doi: 10.1109/TSMCB.2008.920269 – ident: key-10.3934/math.2023519-16 doi: 10.1016/j.jfranklin.2016.03.012 – ident: key-10.3934/math.2023519-14 doi: 10.1007/s11768-021-00046-y – ident: key-10.3934/math.2023519-27 doi: 10.1016/j.neucom.2018.04.018 – ident: key-10.3934/math.2023519-6 doi: 10.1109/9.863597 – ident: key-10.3934/math.2023519-11 – ident: key-10.3934/math.2023519-23 doi: 10.1049/el.2017.3238 – ident: key-10.3934/math.2023519-15 – ident: key-10.3934/math.2023519-1 doi: 10.1016/j.automatica.2014.05.011 |
| SSID | ssj0002124274 |
| Score | 2.2201133 |
| Snippet | In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| StartPage | 10249 |
| SubjectTerms | deterministic system model-free reinforcement q-learning stochastic linear quadratic optimal tracking time delay value iterative |
| Title | Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm |
| URI | https://doaj.org/article/0a486d8a0f0341b7868ffefd28d0da5c |
| Volume | 8 |
| WOSCitedRecordID | wos000938816100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2473-6988 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002124274 issn: 2473-6988 databaseCode: DOA dateStart: 20160101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2473-6988 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002124274 issn: 2473-6988 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8QwDI4QYoAB8RRvZYAxIjRpk4yAQCwgECCxVW4egARX6PUQLPx27GtBx8TC0qGKoshu7c-u-32M7YYUq1z6QsBBZoTOZBIW6yChKpdnsbIWUhiLTZiLC3t35y4npL5oJqyjB-4Mty9B2yJYkEliwK2MLWxKMYXMBhkg9xR9EfVMFFMUgzEga6y3ukl35ZTeR_xH3x4yEqT7lYMmqPrHOeV0gc33YJAfdodYZFNxsMTmzn-YVIfL7P26rf0DEJsyJ0QIDX8dQSC_eV7j-_6MG7QNeGp5837wnCMS5fS_bYOQWJB8PO8Ym4ec-q6cmCE_hpwyWOD1gF-JXjzinsPTfd3gmucVdnt6cnN8Jnq1BAGqcK3wgKWErpQ13kERTdA-jxHzb8pDkCoYRDo-SVmZ4HwyUeWOuK68ilkyILVaZdODehDXGJchi5i3UpGD1S5GFyQoK3VVIDyAwq-zvW_7lS8dKUaJxQTZuSQ7l72d19kRGfdnDVFZj2-gg8veweVfDt74j0022SydqeudbLHpthnFbTbj39rHYbMzfnbwev558gXRt89C |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Stochastic+linear+quadratic+optimal+tracking+control+for+discrete-time+systems+with+delays+based+on+Q-learning+algorithm&rft.jtitle=AIMS+mathematics&rft.au=Xufeng+Tan&rft.au=Yuan+Li&rft.au=Yang+Liu&rft.date=2023-01-01&rft.pub=AIMS+Press&rft.eissn=2473-6988&rft.volume=8&rft.issue=5&rft.spage=10249&rft.epage=10265&rft_id=info:doi/10.3934%2Fmath.2023519&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_0a486d8a0f0341b7868ffefd28d0da5c |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2473-6988&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2473-6988&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2473-6988&client=summon |