Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback

In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, w...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on cybernetics Ročník 50; číslo 11; s. 4670 - 4679
Hlavní autori: Rizvi, Syed Ali Asad, Lin, Zongli
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States IEEE 01.11.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:2168-2267, 2168-2275, 2168-2275
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
AbstractList In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
Author Rizvi, Syed Ali Asad
Lin, Zongli
Author_xml – sequence: 1
  givenname: Syed Ali Asad
  orcidid: 0000-0003-1412-8841
  surname: Rizvi
  fullname: Rizvi, Syed Ali Asad
  email: sr9gs@virginia.edu
  organization: Charles L. Brown Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA
– sequence: 2
  givenname: Zongli
  orcidid: 0000-0003-1589-1443
  surname: Lin
  fullname: Lin, Zongli
  email: zl5y@virginia.edu
  organization: Charles L. Brown Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/30605117$$D View this record in MEDLINE/PubMed
BookMark eNp9kU9vEzEQxS1UREvpB0BIyFIvvWzwn13bOdLQAlKkipIeOFmOPa5cdr2p7T3k2-OQkEMP-OLR0-_NjOa9RSdxjIDQe0pmlJL5p9Xi1_WMEapmTCkhefcKnTEqVMOY7E6OtZCn6CLnJ1KfqtJcvUGnnAjSUSrP0PM9hOjHZGGAWPASTIohPjbXJoPDyxCrgH9MxiVTgsX38Dj1tRojHj1ejLGEOI1TblZhAPxzmwsMGT_k2gJ_2UYzVM_dVDZTwbcAbm3s73fotTd9hovDf44ebm9Wi2_N8u7r98XnZWN5Oy-NlYJR1bag5o4S2QnXOqOMBE8UER3hjnBgvqrSW-vAd2TtLTBHWusU8fwcXe37btL4PEEuegjZQt-bCHVjXe_DCWWUyYpevkCfxinFup1mbSc6JmUrKvXxQE3rAZzepDCYtNX_jlkBugdsGnNO4I8IJXqXmd5lpneZ6UNm1SNfeGwofw9ckgn9f50f9s4AAMdJShDCpeJ_AEqZo_s
CODEN ITCEB8
CitedBy_id crossref_primary_10_1109_TSMC_2024_3495821
crossref_primary_10_1016_j_oceaneng_2023_115700
crossref_primary_10_1109_TCYB_2025_3540967
crossref_primary_10_1002_acs_3326
crossref_primary_10_1016_j_oceaneng_2022_111186
crossref_primary_10_3390_electronics13122424
crossref_primary_10_1016_j_conengprac_2023_105675
crossref_primary_10_1016_j_neucom_2021_06_073
crossref_primary_10_1007_s11424_025_4358_2
crossref_primary_10_1109_ACCESS_2022_3208058
crossref_primary_10_1109_TNNLS_2020_3045087
crossref_primary_10_1109_TCYB_2023_3338197
crossref_primary_10_1016_j_automatica_2023_111261
crossref_primary_10_1109_TCYB_2019_2926631
crossref_primary_10_1109_TAI_2022_3215671
crossref_primary_10_1109_TCSI_2025_3560303
crossref_primary_10_1109_TCYB_2021_3051456
crossref_primary_10_1109_TCYB_2020_3029077
crossref_primary_10_1109_TCYB_2025_3582874
crossref_primary_10_1109_TAC_2025_3532182
crossref_primary_10_1016_j_jfranklin_2024_107439
crossref_primary_10_1109_TAC_2023_3247461
crossref_primary_10_1002_acs_3898
crossref_primary_10_1109_ACCESS_2020_3031891
crossref_primary_10_1109_TIE_2023_3292873
crossref_primary_10_1049_cth2_12682
crossref_primary_10_1109_TCYB_2022_3227313
crossref_primary_10_3390_aerospace12010030
crossref_primary_10_1016_j_fss_2025_109565
crossref_primary_10_1109_TSMC_2024_3506695
crossref_primary_10_1109_TCYB_2024_3472020
crossref_primary_10_1016_j_procs_2023_11_019
crossref_primary_10_1109_TCSII_2024_3483909
crossref_primary_10_1109_TCYB_2020_2989419
crossref_primary_10_1109_TSMC_2024_3431453
crossref_primary_10_1109_TCYB_2020_3029825
crossref_primary_10_1007_s11768_022_00081_3
crossref_primary_10_1109_TCYB_2021_3086223
crossref_primary_10_1109_TAC_2025_3535281
crossref_primary_10_1109_TCYB_2022_3202864
crossref_primary_10_1016_j_automatica_2024_111601
crossref_primary_10_1017_S0263574722001138
crossref_primary_10_1109_TIM_2025_3548807
crossref_primary_10_1109_TIE_2024_3426038
crossref_primary_10_1109_TCYB_2020_2981480
crossref_primary_10_3390_pr13061791
crossref_primary_10_1109_TCSII_2023_3266431
crossref_primary_10_1109_TNNLS_2023_3333551
crossref_primary_10_1016_j_automatica_2024_111848
crossref_primary_10_1109_TIE_2023_3247734
crossref_primary_10_1016_j_neucom_2021_01_070
crossref_primary_10_1109_TNNLS_2021_3137548
crossref_primary_10_1109_JAS_2023_123759
crossref_primary_10_1109_TAES_2023_3235873
crossref_primary_10_1002_rnc_6374
crossref_primary_10_1109_TAC_2023_3275732
crossref_primary_10_1109_TCNS_2024_3462549
crossref_primary_10_1002_acs_3832
crossref_primary_10_1109_TNSE_2024_3382400
crossref_primary_10_1109_TSMC_2023_3344017
crossref_primary_10_1007_s11432_024_4209_9
crossref_primary_10_1109_ACCESS_2024_3395249
crossref_primary_10_1109_TCSII_2024_3495679
crossref_primary_10_1109_TASE_2025_3593481
crossref_primary_10_1007_s12204_022_2460_3
crossref_primary_10_1109_TCYB_2019_2953218
crossref_primary_10_1007_s11424_025_4535_3
crossref_primary_10_1016_j_comcom_2024_108011
crossref_primary_10_1109_TMECH_2020_3016326
crossref_primary_10_1109_TCYB_2021_3103148
crossref_primary_10_1049_iet_cta_2020_0255
crossref_primary_10_1109_LCSYS_2024_3523244
crossref_primary_10_1177_09596518231172264
crossref_primary_10_1109_JAS_2023_123843
Cites_doi 10.1080/00207179.2013.863432
10.1007/s11768-011-0166-4
10.1109/ACC.2016.7526571
10.1109/ACC.1994.735224
10.1016/j.automatica.2018.05.027
10.1109/TCYB.2014.2313915
10.1109/MCAS.2009.933854
10.1016/j.automatica.2008.08.017
10.1109/TCST.2014.2322778
10.1109/TSMCB.2010.2043839
10.1109/MCS.2012.2214134
10.1016/j.automatica.2012.06.096
10.1109/TAC.1968.1098829
10.1109/TAC.2016.2548662
10.1109/TCYB.2014.2384016
10.1016/j.automatica.2016.05.003
10.1109/TCYB.2015.2477810
10.1109/TNN.2009.2027233
10.1109/TAC.2011.2122570
10.1109/CDC.2017.8263836
10.1109/TAC.2016.2616644
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
7TB
8FD
F28
FR3
H8D
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TCYB.2018.2886735
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Aerospace Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
Aerospace Database
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2168-2275
EndPage 4679
ExternalDocumentID 30605117
10_1109_TCYB_2018_2886735
8600378
Genre orig-research
Journal Article
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
AENEX
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
NPM
RIG
7SC
7SP
7TB
8FD
F28
FR3
H8D
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3
IEDL.DBID RIE
ISICitedReferencesCount 92
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000583709500010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2168-2267
2168-2275
IngestDate Sun Sep 28 01:44:13 EDT 2025
Mon Jun 30 06:52:22 EDT 2025
Thu Jan 02 22:35:05 EST 2025
Tue Nov 18 22:32:14 EST 2025
Sat Nov 29 02:02:26 EST 2025
Wed Aug 27 02:31:51 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-1412-8841
0000-0003-1589-1443
PMID 30605117
PQID 2456527746
PQPubID 85422
PageCount 10
ParticipantIDs ieee_primary_8600378
proquest_journals_2456527746
crossref_primary_10_1109_TCYB_2018_2886735
proquest_miscellaneous_2163012127
pubmed_primary_30605117
crossref_citationtrail_10_1109_TCYB_2018_2886735
PublicationCentury 2000
PublicationDate 2020-11-01
PublicationDateYYYYMMDD 2020-11-01
PublicationDate_xml – month: 11
  year: 2020
  text: 2020-11-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transactions on cybernetics
PublicationTitleAbbrev TCYB
PublicationTitleAlternate IEEE Trans Cybern
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref24
ref23
ref15
ref14
landelius (ref8) 1997
ref20
ref11
ref22
ref10
ref21
lewis (ref9) 1995
ref2
lewis (ref12) 2012; 32
ref1
ref17
ref16
ref19
ref18
ref7
ref4
ref6
ref5
gao (ref3) 2014
References_xml – year: 1997
  ident: ref8
  article-title: Reinforcement learning and distributed local model synthesis
– ident: ref15
  doi: 10.1080/00207179.2013.863432
– ident: ref19
  doi: 10.1007/s11768-011-0166-4
– year: 1995
  ident: ref9
  publication-title: Optimal Control
– ident: ref18
  doi: 10.1109/ACC.2016.7526571
– ident: ref2
  doi: 10.1109/ACC.1994.735224
– ident: ref17
  doi: 10.1016/j.automatica.2018.05.027
– ident: ref23
  doi: 10.1109/TCYB.2014.2313915
– ident: ref11
  doi: 10.1109/MCAS.2009.933854
– ident: ref20
  doi: 10.1016/j.automatica.2008.08.017
– ident: ref24
  doi: 10.1109/TCST.2014.2322778
– ident: ref10
  doi: 10.1109/TSMCB.2010.2043839
– volume: 32
  start-page: 76
  year: 2012
  ident: ref12
  article-title: Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers
  publication-title: IEEE Control Syst Mag
  doi: 10.1109/MCS.2012.2214134
– ident: ref5
  doi: 10.1016/j.automatica.2012.06.096
– ident: ref7
  doi: 10.1109/TAC.1968.1098829
– ident: ref4
  doi: 10.1109/TAC.2016.2548662
– ident: ref6
  doi: 10.1109/TCYB.2014.2384016
– ident: ref1
  doi: 10.1016/j.automatica.2016.05.003
– ident: ref13
  doi: 10.1109/TCYB.2015.2477810
– ident: ref22
  doi: 10.1109/TNN.2009.2027233
– ident: ref21
  doi: 10.1109/TAC.2011.2122570
– ident: ref16
  doi: 10.1109/CDC.2017.8263836
– start-page: 2085
  year: 2014
  ident: ref3
  article-title: Adaptive and optimal output feedback control of linear systems: An adaptive dynamic programming approach
  publication-title: Proc World Congr Intell Control Autom (WCICA)
– ident: ref14
  doi: 10.1109/TAC.2016.2616644
SSID ssj0000816898
Score 2.5337923
Snippet In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 4670
SubjectTerms Adaptive dynamic programming (ADP)
Algorithms
Continuous time systems
Cost function
Dynamic programming
Feedback control
Iterative methods
Learning
Linear quadratic regulator
linear quadratic regulator (LQR)
Mathematical model
Mathematical models
Optimal control
Optimization
Output feedback
Parameter estimation
Parameterization
reinforcement learning (RL)
Riccati equation
Stability analysis
State feedback
Title Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback
URI https://ieeexplore.ieee.org/document/8600378
https://www.ncbi.nlm.nih.gov/pubmed/30605117
https://www.proquest.com/docview/2456527746
https://www.proquest.com/docview/2163012127
Volume 50
WOSCitedRecordID wos000583709500010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2168-2275
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816898
  issn: 2168-2267
  databaseCode: RIE
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LbtUwEB21VRdsgFIegVIZiQUg3Lp52M6yD666QAWqIl1WkeMHqlrllt6E7--M4xs2gNRdlDgP5RzLMx77HIC3lZZS2VZwq1TBy9pK3oqq4t4Eh9lBkbcmSuZ_Vmdnej6vv67Bx2kvjPc-Lj7ze3QYa_luYQeaKtvXkuRS9DqsKyXHvVrTfEo0kIjWtzkecIwqVCpiHoh6_-L4xxGt49J7udZSFWRYg8EyMjI6lf0ZkaLFyr-jzTjqzB7d73sfw8MUXbLDkQ5bsOa7J7CV-u-SvUsi0--34de5j5qpNk4PsiSz-pMf4ajmGGaoeIJ9G4wjhlh2PlrWI4hsERhJWl12w2JYctpCwpLsOYvrD9jJaHLPvgz9zdCzGY6PrbFXT-H77NPF8SlP9gvcFmXdI3qS8r_S69phklhJVzqjjfJBUDVWFE4UPg94VgVrnQ-VaIP1uROldVqE4hlsdIvOvwB2YIwLOmBqFtqyslpb5IF3dRXwCWXbZiBWEDQ2aZOTRcZ1E3MUUTcEYEMANgnADD5Mt9yMwhz_a7xN6EwNEzAZ7KxwblLXXTaxEpxjVCwzeDNdxk5HlRTTefy3DVKsIDG8XGXwfOTH9OwVrV7-_Z2v4EFOKXvczrgDG_3t4F_Dpv3dXy5vd5HZc70bmX0H-tTzIg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2VggSXQiltAwWMxAEQbl3HiZ0jLayKWBaoFqmcrMQfqGqVLd2E38_Y8YYLIHGLHMeJ8sbyjMfzHsDzQpWlNA2jRsqcisqUtGFFQV3tLUYHOW_qSJk_lbOZOjurPq_B67EWxjkXD5-5_XAZc_l2YfqwVXagykCXom7AzUIIzoZqrXFHJUpIRPFbjhcU_QqZ0piHrDqYH387Cie51D5XqpR5kKxBdxltMmqV_V6TosjK3_3NuO5M7v7fF9-DjeRfkjeDQWzCmmvvw2aawUvyItFMv9yCH6cusqaauEFIEtHqd3qE65olGKNiA_nS1zbYiCGng2g9wkgWngRSq_O2X_RLGopISCI-J_EEAnk7yNyTT3131XdkgitkU5uLB_B18m5-fEKTAAM1uag6xK8MEaBwqrIYJhalFbZWtXSehXwsyy3LHffYKr0x1vmCNd44bpkwVjGfb8N6u2jdLpDDurZeeQzOfCMKo5RBS3C2KjyOIJomA7aCQJvETh5EMi51jFJYpQOAOgCoE4AZvBofuRqoOf7VeSugM3ZMwGSwt8JZp8m71DEXzNEvLjN4Nt7GaRdyKXXr8N9qNLE80OFxmcHOYB_j2Cuzevjndz6F2yfzj1M9fT_78Aju8BDAx-LGPVjvrnv3GG6Zn9358vpJtO9fCCn1gQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reinforcement+Learning-Based+Linear+Quadratic+Regulation+of+Continuous-Time+Systems+Using+Dynamic+Output+Feedback&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Syed+Ali+Asad+Rizvi&rft.au=Lin%2C+Zongli&rft.date=2020-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=2168-2267&rft.eissn=2168-2275&rft.volume=50&rft.issue=11&rft.spage=4670&rft_id=info:doi/10.1109%2FTCYB.2018.2886735&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon