Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm

In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:AIMS mathematics Ročník 8; číslo 5; s. 10249 - 10265
Hlavní autoři: Tan, Xufeng, Li, Yuan, Liu, Yang
Médium: Journal Article
Jazyk:angličtina
Vydáno: AIMS Press 01.01.2023
Témata:
ISSN:2473-6988, 2473-6988
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm.
AbstractList In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm.
Author Li, Yuan
Tan, Xufeng
Liu, Yang
Author_xml – sequence: 1
  givenname: Xufeng
  surname: Tan
  fullname: Tan, Xufeng
  organization: School of Science, Shenyang University of Technology, Shenyang 110870, China
– sequence: 2
  givenname: Yuan
  surname: Li
  fullname: Li, Yuan
  organization: School of Science, Shenyang University of Technology, Shenyang 110870, China
– sequence: 3
  givenname: Yang
  surname: Liu
  fullname: Liu, Yang
  organization: School of Electrical and Electronic Engineering, Shenyang University of Technology, Shenyang 110870, China
BookMark eNpNkV1LwzAUhoNMcM7d-QPyA-xMkzRpL2X4MRiIqNflLDnZOrtmJhHdv7fVIV6dw8vLAy_PORl1vkNCLnM2E5WQ1ztImxlnXBR5dULGXGqRqaosR__-MzKNccsY4zmXXMsx-XpO3mwgpsbQtukQAn3_ABtgCPw-NTtoaQpg3ppuTY3vUvAtdT5Q20QTMGHWd5DGQ0y4i_SzSRtqsYVDpCuIaKnv6FPW9uBuIEC79qHv7C7IqYM24vR4J-T17vZl_pAtH-8X85tlBkJVKTOgJJcrUWpTgUJtpSkQuVKusJYJqwvFjWNspW1lnEZRVJJV2gjkTgOTYkIWv1zrYVvvQz8oHGoPTf0T-LCuIfRjW6wZyFLZEphjQuYrXarSOXSWl5ZZKEzPuvplmeBjDOj-eDmrBwn1IKE-ShDfsuh_gg
Cites_doi 10.1016/j.neucom.2015.06.053
10.1016/j.jfranklin.2017.11.029
10.1109/TAC.2010.2095550
10.1109/TNN.2009.2027233
10.1109/TASE.2013.2284545
10.1109/TAC.1987.1104710
10.1109/TCST.2013.2293401
10.1016/j.neucom.2018.07.098
10.1007/s00521-011-0711-6
10.1109/TNN.2007.912319
10.1016/0024-3795(87)90222-9
10.1016/j.neucom.2017.12.007
10.1007/s12555-016-0711-5
10.1098/rspb.2019.2454
10.1109/TCYB.2014.2384016
10.1016/j.automatica.2008.08.017
10.1109/MCAS.2009.933854
10.1016/j.automatica.2014.02.015
10.1109/TSMCB.2008.920269
10.1016/j.jfranklin.2016.03.012
10.1007/s11768-021-00046-y
10.1016/j.neucom.2018.04.018
10.1109/9.863597
10.1049/el.2017.3238
10.1016/j.automatica.2014.05.011
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.3934/math.2023519
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 2473-6988
EndPage 10265
ExternalDocumentID oai_doaj_org_article_0a486d8a0f0341b7868ffefd28d0da5c
10_3934_math_2023519
GroupedDBID AAYXX
ADBBV
ALMA_UNASSIGNED_HOLDINGS
AMVHM
BCNDV
CITATION
EBS
FRJ
GROUPED_DOAJ
IAO
ITC
M~E
OK1
RAN
ID FETCH-LOGICAL-a369t-ca6424b387c9a6e7d4c5ee266f5dd03d7562cf00b7d9cf7e3594097c3e2f7a043
IEDL.DBID DOA
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000938816100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2473-6988
IngestDate Fri Oct 03 12:40:18 EDT 2025
Sat Nov 29 06:04:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a369t-ca6424b387c9a6e7d4c5ee266f5dd03d7562cf00b7d9cf7e3594097c3e2f7a043
OpenAccessLink https://doaj.org/article/0a486d8a0f0341b7868ffefd28d0da5c
PageCount 17
ParticipantIDs doaj_primary_oai_doaj_org_article_0a486d8a0f0341b7868ffefd28d0da5c
crossref_primary_10_3934_math_2023519
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationTitle AIMS mathematics
PublicationYear 2023
Publisher AIMS Press
Publisher_xml – name: AIMS Press
References key-10.3934/math.2023519-4
key-10.3934/math.2023519-5
key-10.3934/math.2023519-2
key-10.3934/math.2023519-3
key-10.3934/math.2023519-10
key-10.3934/math.2023519-1
key-10.3934/math.2023519-12
key-10.3934/math.2023519-11
key-10.3934/math.2023519-8
key-10.3934/math.2023519-9
key-10.3934/math.2023519-6
key-10.3934/math.2023519-7
key-10.3934/math.2023519-14
key-10.3934/math.2023519-13
key-10.3934/math.2023519-16
key-10.3934/math.2023519-15
key-10.3934/math.2023519-18
key-10.3934/math.2023519-17
key-10.3934/math.2023519-19
key-10.3934/math.2023519-21
key-10.3934/math.2023519-20
key-10.3934/math.2023519-23
key-10.3934/math.2023519-22
key-10.3934/math.2023519-25
key-10.3934/math.2023519-24
key-10.3934/math.2023519-27
key-10.3934/math.2023519-26
key-10.3934/math.2023519-28
References_xml – ident: key-10.3934/math.2023519-21
  doi: 10.1016/j.neucom.2015.06.053
– ident: key-10.3934/math.2023519-18
  doi: 10.1016/j.jfranklin.2017.11.029
– ident: key-10.3934/math.2023519-22
  doi: 10.1109/TAC.2010.2095550
– ident: key-10.3934/math.2023519-19
  doi: 10.1109/TNN.2009.2027233
– ident: key-10.3934/math.2023519-26
  doi: 10.1109/TASE.2013.2284545
– ident: key-10.3934/math.2023519-5
  doi: 10.1109/TAC.1987.1104710
– ident: key-10.3934/math.2023519-17
  doi: 10.1109/TCST.2013.2293401
– ident: key-10.3934/math.2023519-25
  doi: 10.1016/j.neucom.2018.07.098
– ident: key-10.3934/math.2023519-3
  doi: 10.1007/s00521-011-0711-6
– ident: key-10.3934/math.2023519-20
  doi: 10.1109/TNN.2007.912319
– ident: key-10.3934/math.2023519-12
– ident: key-10.3934/math.2023519-7
  doi: 10.1016/0024-3795(87)90222-9
– ident: key-10.3934/math.2023519-13
  doi: 10.1016/j.neucom.2017.12.007
– ident: key-10.3934/math.2023519-2
  doi: 10.1007/s12555-016-0711-5
– ident: key-10.3934/math.2023519-4
  doi: 10.1098/rspb.2019.2454
– ident: key-10.3934/math.2023519-9
  doi: 10.1109/TCYB.2014.2384016
– ident: key-10.3934/math.2023519-8
  doi: 10.1016/j.automatica.2008.08.017
– ident: key-10.3934/math.2023519-28
  doi: 10.1109/MCAS.2009.933854
– ident: key-10.3934/math.2023519-10
  doi: 10.1016/j.automatica.2014.02.015
– ident: key-10.3934/math.2023519-24
  doi: 10.1109/TSMCB.2008.920269
– ident: key-10.3934/math.2023519-16
  doi: 10.1016/j.jfranklin.2016.03.012
– ident: key-10.3934/math.2023519-14
  doi: 10.1007/s11768-021-00046-y
– ident: key-10.3934/math.2023519-27
  doi: 10.1016/j.neucom.2018.04.018
– ident: key-10.3934/math.2023519-6
  doi: 10.1109/9.863597
– ident: key-10.3934/math.2023519-11
– ident: key-10.3934/math.2023519-23
  doi: 10.1049/el.2017.3238
– ident: key-10.3934/math.2023519-15
– ident: key-10.3934/math.2023519-1
  doi: 10.1016/j.automatica.2014.05.011
SSID ssj0002124274
Score 2.2201133
Snippet In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 10249
SubjectTerms deterministic system
model-free
reinforcement q-learning
stochastic linear quadratic optimal tracking
time delay
value iterative
Title Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm
URI https://doaj.org/article/0a486d8a0f0341b7868ffefd28d0da5c
Volume 8
WOSCitedRecordID wos000938816100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2473-6988
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002124274
  issn: 2473-6988
  databaseCode: DOA
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2473-6988
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002124274
  issn: 2473-6988
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8QwDI4QYoAB8RRvZYAxIjRpk4yAQCwgECCxVW4egARX6PUQLPx27GtBx8TC0qGKoshu7c-u-32M7YYUq1z6QsBBZoTOZBIW6yChKpdnsbIWUhiLTZiLC3t35y4npL5oJqyjB-4Mty9B2yJYkEliwK2MLWxKMYXMBhkg9xR9EfVMFFMUgzEga6y3ukl35ZTeR_xH3x4yEqT7lYMmqPrHOeV0gc33YJAfdodYZFNxsMTmzn-YVIfL7P26rf0DEJsyJ0QIDX8dQSC_eV7j-_6MG7QNeGp5837wnCMS5fS_bYOQWJB8PO8Ym4ec-q6cmCE_hpwyWOD1gF-JXjzinsPTfd3gmucVdnt6cnN8Jnq1BAGqcK3wgKWErpQ13kERTdA-jxHzb8pDkCoYRDo-SVmZ4HwyUeWOuK68ilkyILVaZdODehDXGJchi5i3UpGD1S5GFyQoK3VVIDyAwq-zvW_7lS8dKUaJxQTZuSQ7l72d19kRGfdnDVFZj2-gg8veweVfDt74j0022SydqeudbLHpthnFbTbj39rHYbMzfnbwev558gXRt89C
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Stochastic+linear+quadratic+optimal+tracking+control+for+discrete-time+systems+with+delays+based+on+Q-learning+algorithm&rft.jtitle=AIMS+mathematics&rft.au=Xufeng+Tan&rft.au=Yuan+Li&rft.au=Yang+Liu&rft.date=2023-01-01&rft.pub=AIMS+Press&rft.eissn=2473-6988&rft.volume=8&rft.issue=5&rft.spage=10249&rft.epage=10265&rft_id=info:doi/10.3934%2Fmath.2023519&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_0a486d8a0f0341b7868ffefd28d0da5c
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2473-6988&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2473-6988&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2473-6988&client=summon