Adaptive Q-Learning Based Model-Free H Control of Continuous-Time Nonlinear Systems: Theory and Application
Although model based <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control scheme for nonlinear continuous-time (CT) systems with unknown system dynamics has been extensively studied, model-free <inline-formula><tex-math...
Uložené v:
| Vydané v: | IEEE transactions on emerging topics in computational intelligence Ročník 9; číslo 2; s. 1143 - 1152 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.04.2025
|
| Predmet: | |
| ISSN: | 2471-285X, 2471-285X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Although model based <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control scheme for nonlinear continuous-time (CT) systems with unknown system dynamics has been extensively studied, model-free <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control of nonlinear CT systems via Q-learning is still a challenging problem. This paper develops a novel Q-learning based model-free <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control scheme for nonlinear CT systems, where the adaptive critic and actor continuously and simultaneously update each other, eliminating the need for iterative steps. As a result, a hybrid structure is avoided and there is no longer a requirement for an initial stabilizing control policy. To obtain the <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control of the CT nonlinear system, the Q-learning strategy is introduced to online resolve the <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control problem in a non-iterative approach, where the system dynamics are not required. In addition, a new learning law is further developed by utilizing a sliding mode scheme to online update the critic neural network (NN) weights. Due to the strong convergence of critic NN weights, the actor NN used in most <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control algorithms is removed. Finally, numerical simulation and experimental results of an adaptive cruise control (ACC) system based on a real vehicle effectively demonstrate the feasibility of the presented control method and learning algorithm. |
|---|---|
| AbstractList | Although model based <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control scheme for nonlinear continuous-time (CT) systems with unknown system dynamics has been extensively studied, model-free <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control of nonlinear CT systems via Q-learning is still a challenging problem. This paper develops a novel Q-learning based model-free <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control scheme for nonlinear CT systems, where the adaptive critic and actor continuously and simultaneously update each other, eliminating the need for iterative steps. As a result, a hybrid structure is avoided and there is no longer a requirement for an initial stabilizing control policy. To obtain the <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control of the CT nonlinear system, the Q-learning strategy is introduced to online resolve the <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control problem in a non-iterative approach, where the system dynamics are not required. In addition, a new learning law is further developed by utilizing a sliding mode scheme to online update the critic neural network (NN) weights. Due to the strong convergence of critic NN weights, the actor NN used in most <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control algorithms is removed. Finally, numerical simulation and experimental results of an adaptive cruise control (ACC) system based on a real vehicle effectively demonstrate the feasibility of the presented control method and learning algorithm. |
| Author | Zhao, Jun Zhao, Ziliang Lv, Yongfeng Wang, Zhangu |
| Author_xml | – sequence: 1 givenname: Jun orcidid: 0000-0003-2908-2583 surname: Zhao fullname: Zhao, Jun email: skdzhaojun@sdust.edu.cn organization: College of Transportation, Shandong Key Laboratory of Hydrogen Electric Hybrid Power System Control and Safety, Shandong University of Science and Technology, Qingdao, China – sequence: 2 givenname: Yongfeng orcidid: 0000-0002-9139-7220 surname: Lv fullname: Lv, Yongfeng email: lvyilian1989@foxmail.com organization: College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan, China – sequence: 3 givenname: Zhangu orcidid: 0000-0001-5251-1580 surname: Wang fullname: Wang, Zhangu email: wangzhangu1@163.com organization: College of Transportation, Shandong Key Laboratory of Hydrogen Electric Hybrid Power System Control and Safety, Shandong University of Science and Technology, Qingdao, China – sequence: 4 givenname: Ziliang orcidid: 0000-0002-1629-677X surname: Zhao fullname: Zhao, Ziliang email: zhaoziliang1@sdust.edu.cn organization: College of Transportation, Shandong Key Laboratory of Hydrogen Electric Hybrid Power System Control and Safety, Shandong University of Science and Technology, Qingdao, China |
| BookMark | eNp9kM1KAzEUhYNUsNa-gLjIC0zN3_y5q8XaQlXEEdwNmcmNRqfJkEyFvr3T2kVx4eqezXc49ztHA-ssIHRJyYRSkl8Xd8VsOWGEiQkXIs9ScoKGTKQ0Yln8NjjKZ2gcwichhOUx5bEYoq-pkm1nvgE_RyuQ3hr7jm9lAIUfnIImmnsAvMAzZzvvGuz0Phq7cZsQFWYN-NHZxtiexS_b0ME63ODiA5zfYmkVnrZtY2rZGWcv0KmWTYDx4Y7Q67yfvohWT_fL2XQV1SwlXZTwrObARSVY_wulHIBADglNqpwqlskqBkoIaK1ywnOtNZUsUSLjKUgFFR-h7Le39i4ED7qsTbdf0HlpmpKScuet3Hsrd97Kg7ceZX_Q1pu19Nv_oatfyADAEZAknDHCfwBzinyN |
| CODEN | ITETCU |
| CitedBy_id | crossref_primary_10_1088_1402_4896_ad98cc crossref_primary_10_1109_TASE_2025_3603734 crossref_primary_10_1038_s41598_025_97930_3 crossref_primary_10_1109_JPHOT_2024_3471827 |
| Cites_doi | 10.1109/TCSII.2021.3112050 10.1016/j.amc.2022.127400 10.1016/j.sysconle.2016.12.003 10.1016/j.automatica.2004.11.034 10.1016/j.automatica.2021.110153 10.1109/TNNLS.2022.3148376 10.1109/CDC.2009.5399753 10.1016/j.automatica.2012.06.096 10.1109/TSMC.2022.3207575 10.1109/TSMC.2022.3173275 10.1080/00207179.2017.1381763 10.1109/TNNLS.2015.2441749 10.1049/iet-cta.2012.0313 10.1109/TETCI.2022.3145706 10.1109/TNNLS.2014.2328590 10.1109/TNNLS.2020.3007414 10.1109/TNNLS.2021.3135405 10.1109/TITS.2020.3001600 10.1109/TNNLS.2015.2461452 10.1109/ICNN.1994.374604 10.1109/TNNLS.2021.3053269 10.1109/TSMC.2020.3003224 10.1016/j.automatica.2010.02.018 10.1016/j.isatra.2016.10.019 10.1109/CDC40024.2019.9030116 10.1007/978-1-4471-4757-2_9 10.1016/j.automatica.2012.06.008 10.1109/JAS.2014.7004668 10.1109/TNNLS.2022.3226518 10.1016/j.sysconle.2021.104880 10.1109/TSMC.2018.2863708 10.1109/TCYB.2014.2313915 10.1007/978-3-319-50815-3_11 10.1109/TETCI.2022.3140375 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TETCI.2024.3449870 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2471-285X |
| EndPage | 1152 |
| ExternalDocumentID | 10_1109_TETCI_2024_3449870 10663220 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Anhui Province Key Laboratory of Advanced Numerical Control Servo Technology grantid: XJSK202301 – fundername: Natural Science Foundation of Shandong Provincial grantid: ZR2022QF011 – fundername: Development Plan for Youth Innovation Teams in Higher Education Institutions in Shandong Province grantid: 2023KJ094 – fundername: National Natural Science Foundation of China grantid: 62203279; 62103296 funderid: 10.13039/501100001809 |
| GroupedDBID | 0R~ 97E AAJGR AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG ACGFS AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE JAVBF OCL RIA RIE AAYXX CITATION |
| ID | FETCH-LOGICAL-c270t-638c3e34b42987113ee0e9e616b91d28ab5e100effd9039fff1a26d4837eadeb3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001308170300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2471-285X |
| IngestDate | Sat Nov 29 08:04:54 EST 2025 Tue Nov 18 21:13:14 EST 2025 Wed Aug 27 01:37:13 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c270t-638c3e34b42987113ee0e9e616b91d28ab5e100effd9039fff1a26d4837eadeb3 |
| ORCID | 0000-0002-1629-677X 0000-0002-9139-7220 0000-0003-2908-2583 0000-0001-5251-1580 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_10663220 crossref_citationtrail_10_1109_TETCI_2024_3449870 crossref_primary_10_1109_TETCI_2024_3449870 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-04-01 |
| PublicationDateYYYYMMDD | 2025-04-01 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE transactions on emerging topics in computational intelligence |
| PublicationTitleAbbrev | TETCI |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref13 ref35 ref12 ref34 ref15 ref14 ref36 ref31 ref30 ref11 ref33 ref10 ref32 ref1 ref17 ref16 ref19 ref18 Werbos (ref2) 1992 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref30 doi: 10.1109/TCSII.2021.3112050 – ident: ref19 doi: 10.1016/j.amc.2022.127400 – ident: ref21 doi: 10.1016/j.sysconle.2016.12.003 – ident: ref32 doi: 10.1016/j.automatica.2004.11.034 – ident: ref20 doi: 10.1016/j.automatica.2021.110153 – ident: ref12 doi: 10.1109/TNNLS.2022.3148376 – ident: ref17 doi: 10.1109/CDC.2009.5399753 – ident: ref5 doi: 10.1016/j.automatica.2012.06.096 – ident: ref24 doi: 10.1109/TSMC.2022.3207575 – ident: ref10 doi: 10.1109/TSMC.2022.3173275 – ident: ref28 doi: 10.1080/00207179.2017.1381763 – ident: ref13 doi: 10.1109/TNNLS.2015.2441749 – ident: ref36 doi: 10.1049/iet-cta.2012.0313 – ident: ref1 doi: 10.1109/TETCI.2022.3145706 – ident: ref22 doi: 10.1109/TNNLS.2014.2328590 – ident: ref9 doi: 10.1080/00207179.2017.1381763 – ident: ref29 doi: 10.1109/TNNLS.2020.3007414 – ident: ref3 doi: 10.1109/TNNLS.2021.3135405 – ident: ref35 doi: 10.1109/TITS.2020.3001600 – ident: ref14 doi: 10.1109/TNNLS.2015.2461452 – ident: ref16 doi: 10.1109/ICNN.1994.374604 – ident: ref7 doi: 10.1109/TNNLS.2021.3053269 – ident: ref33 doi: 10.1109/TSMC.2020.3003224 – ident: ref25 doi: 10.1016/j.automatica.2010.02.018 – ident: ref23 doi: 10.1016/j.isatra.2016.10.019 – ident: ref31 doi: 10.1109/CDC40024.2019.9030116 – ident: ref11 doi: 10.1007/978-1-4471-4757-2_9 – ident: ref18 doi: 10.1016/j.automatica.2012.06.008 – ident: ref6 doi: 10.1109/JAS.2014.7004668 – volume-title: Handbook of Intelligent Control Neural Fuzzy & Adaptive Approaches year: 1992 ident: ref2 article-title: Approximate dynamic programming for real-time control and neural modeling – ident: ref4 doi: 10.1109/TNNLS.2022.3226518 – ident: ref34 doi: 10.1016/j.sysconle.2021.104880 – ident: ref27 doi: 10.1109/TSMC.2018.2863708 – ident: ref15 doi: 10.1109/TCYB.2014.2313915 – ident: ref8 doi: 10.1007/978-3-319-50815-3_11 – ident: ref26 doi: 10.1109/TETCI.2022.3140375 |
| SSID | ssj0002951354 |
| Score | 2.3264422 |
| Snippet | Although model based <inline-formula><tex-math notation="LaTeX">H_{\infty }</tex-math></inline-formula> control scheme for nonlinear continuous-time (CT)... |
| SourceID | crossref ieee |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 1143 |
| SubjectTerms | <named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX"> H_{\infty }</tex-math> </inline-formula> </named-content> control Artificial neural networks Control systems Cost function Heuristic algorithms learning law Nonlinear systems Q-learning Reinforcement learning System dynamics |
| Title | Adaptive Q-Learning Based Model-Free H Control of Continuous-Time Nonlinear Systems: Theory and Application |
| URI | https://ieeexplore.ieee.org/document/10663220 |
| Volume | 9 |
| WOSCitedRecordID | wos001308170300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2471-285X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002951354 issn: 2471-285X databaseCode: RIE dateStart: 20170101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60ePDiAyvWF3vwJlv3kWSz3mqx1EtRqNBb2GQnUixpia2_332ktR4UvA1hA0m-3cx-s_PNIHSj0gKSVFsEcs1d6AZIqrghkArhhJ5aJKHZhByN0slEPTdida-FAQCffAZdZ_qzfDMvVi5UZle49Y-cW4a-K6UMYq1NQIXbvYKIo7Uwhqq78eO4_2QpII-6IoosuaY_nM9WNxXvTAaH_3yMI3TQ7BpxL8B8jHagOkHvPaMX7m-FX0hTJvUNP1ivZLDrcDYjgxoAD3E_ZKPjeenNabWybJ847QcehUIZusZN5fJ7HMT6WFcG974Pt9vodWDfc0ia3gmk4JIuiV1WhQAR5dbfWE7EBAAFBQlLcsUMT3UeA6MUytIoKlRZlkzzxLj68i6FOhenqFXNKzhD2DIOmksTG-EsWWqZFEwyk3OhjI5ZB7H1R82KprC4628xyzzBoCrzQGQOiKwBooNuN_csQlmNP0e3HQpbIwMA579cv0D73LXp9Qk2l6i1rFdwhfaKz-X0o7720-YLk4_AeA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF5EBb34wIr1uQdvkrqPvNZbLZYWa1Co0FvYZCdSLGmJrb_ffaS1HhS8DWEDSb7dzH6z880gdC3iHMJYagQyyUzoBrxYMOVBzLkRekoeumYTUZLEo5F4rsXqVgsDADb5DFrGtGf5apovTKhMr3DtHxnTDH0r8H1GnVxrFVJherfAA38pjSHidvgw7PQ1CWR-i_u-ptfkh_tZ66di3Ul3_58PcoD26n0jbjugD9EGlEfova3kzPyv8ItXF0p9w_faLylsepxNvG4FgHu44_LR8bSw5rhcaL7vGfUHTlypDFnhunb5HXZyfSxLhdvfx9sN9NrV79nz6u4JXs4iMvf0wso5cD_THkezIsoBCAgIaZgJqlgsswAoIVAUShAuiqKgkoXKVJg3SdQZP0ab5bSEE4Q15yBZpALFjRUVMgpzGlGVMS6UDGgT0eVHTfO6tLjpcDFJLcUgIrVApAaItAaiiW5W98xcYY0_RzcMCmsjHQCnv1y_Qju94dMgHfSTxzO0y0zTXptuc44259UCLtB2_jkff1SXdgp9AcFIw78 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adaptive+Q-Learning+Based+Model-Free+H+Control+of+Continuous-Time+Nonlinear+Systems%3A+Theory+and+Application&rft.jtitle=IEEE+transactions+on+emerging+topics+in+computational+intelligence&rft.au=Zhao%2C+Jun&rft.au=Lv%2C+Yongfeng&rft.au=Wang%2C+Zhangu&rft.au=Zhao%2C+Ziliang&rft.date=2025-04-01&rft.pub=IEEE&rft.eissn=2471-285X&rft.volume=9&rft.issue=2&rft.spage=1143&rft.epage=1152&rft_id=info:doi/10.1109%2FTETCI.2024.3449870&rft.externalDocID=10663220 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2471-285X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2471-285X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2471-285X&client=summon |