Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning

The Robbins-Monro stochastic approximation algorithm is a foundation of many algorithmic frameworks for reinforcement learning (RL), and often an efficient approach to solving (or approximating the solution to) complex optimal control problems. However, in many cases practitioners are unable to appl...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the IEEE Conference on Decision & Control pp. 5244 - 5251
Main Authors: Bernstein, Andrey, Chen, Yue, Colombino, Marcello, Dall'Anese, Emiliano, Mehta, Prashant, Meyn, Sean
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2019
Subjects:
ISSN:2576-2370
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The Robbins-Monro stochastic approximation algorithm is a foundation of many algorithmic frameworks for reinforcement learning (RL), and often an efficient approach to solving (or approximating the solution to) complex optimal control problems. However, in many cases practitioners are unable to apply these techniques because of an inherent high variance. This paper aims to provide a general foundation for "quasistochastic approximation," in which all of the processes under consideration are deterministic, much like quasi-Monte-Carlo for variance reduction in simulation. The variance reduction can be substantial, subject to tuning of pertinent parameters in the algorithm. This paper introduces a new coupling argument to establish optimal rate of convergence provided the gain is sufficiently large. These results are established for linear models, and tested also in non-ideal settings. A major application of these general results is a new class of RL algorithms for deterministic state space models. In this setting, the main contribution is a class of algorithms for approximating the value function for a given policy, using a different policy designed to introduce exploration.
AbstractList The Robbins-Monro stochastic approximation algorithm is a foundation of many algorithmic frameworks for reinforcement learning (RL), and often an efficient approach to solving (or approximating the solution to) complex optimal control problems. However, in many cases practitioners are unable to apply these techniques because of an inherent high variance. This paper aims to provide a general foundation for "quasistochastic approximation," in which all of the processes under consideration are deterministic, much like quasi-Monte-Carlo for variance reduction in simulation. The variance reduction can be substantial, subject to tuning of pertinent parameters in the algorithm. This paper introduces a new coupling argument to establish optimal rate of convergence provided the gain is sufficiently large. These results are established for linear models, and tested also in non-ideal settings. A major application of these general results is a new class of RL algorithms for deterministic state space models. In this setting, the main contribution is a class of algorithms for approximating the value function for a given policy, using a different policy designed to introduce exploration.
Author Meyn, Sean
Colombino, Marcello
Mehta, Prashant
Dall'Anese, Emiliano
Bernstein, Andrey
Chen, Yue
Author_xml – sequence: 1
  givenname: Andrey
  surname: Bernstein
  fullname: Bernstein, Andrey
  organization: NREL,Golden Colorado
– sequence: 2
  givenname: Yue
  surname: Chen
  fullname: Chen, Yue
  organization: NREL,Golden Colorado
– sequence: 3
  givenname: Marcello
  surname: Colombino
  fullname: Colombino, Marcello
  organization: NREL,Golden Colorado
– sequence: 4
  givenname: Emiliano
  surname: Dall'Anese
  fullname: Dall'Anese, Emiliano
  organization: University of Colorado Boulder,Department of ECEE
– sequence: 5
  givenname: Prashant
  surname: Mehta
  fullname: Mehta, Prashant
  organization: University of Illinois Urbana-Champaign,Department of MAE
– sequence: 6
  givenname: Sean
  surname: Meyn
  fullname: Meyn, Sean
  organization: University of Florida in Gainesville,Department of ECE
BookMark eNotj11LwzAUQKMouE5_gQj9A6k3N23S-zg6v6AwP59H7G40siWlreD-vYJ7Om-HczJxElNkIa4UFEoBXTfLpgTAskBQVBAgYWmPRKYs1kppqulYzLCyRqK2cCaycfwC0ESlnonl07cbg3yZUvfpxil0-aLvh_QTdm4KKeYubvKV9_IxbUO3z585RJ-Gjnccp7xlN8QQP87FqXfbkS8OnIu325vX5l62q7uHZtHKgKAnqT0ZXwE47CpnnWa7Qae8RWtqeifDaFTpySNbNKaqnSZVGc9kFHowXs_F5b83MPO6H_4ih_36cKx_ATUVTFY
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CDC40024.2019.9029247
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Government
EISBN 1728113989
9781728113982
EISSN 2576-2370
EndPage 5251
ExternalDocumentID 9029247
Genre orig-research
GroupedDBID 29P
6IE
6IH
6IL
6IN
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-3f96f500a2c5a7a3e7d2a1f727689b96e2614f9f2e726658a39156fe9612f06f3
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000560779004129&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 06:01:08 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-3f96f500a2c5a7a3e7d2a1f727689b96e2614f9f2e726658a39156fe9612f06f3
PageCount 8
ParticipantIDs ieee_primary_9029247
PublicationCentury 2000
PublicationDate 2019-Dec.
PublicationDateYYYYMMDD 2019-12-01
PublicationDate_xml – month: 12
  year: 2019
  text: 2019-Dec.
PublicationDecade 2010
PublicationTitle Proceedings of the IEEE Conference on Decision & Control
PublicationTitleAbbrev CDC
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0039943
Score 1.77118
Snippet The Robbins-Monro stochastic approximation algorithm is a foundation of many algorithmic frameworks for reinforcement learning (RL), and often an efficient...
SourceID ieee
SourceType Publisher
StartPage 5244
SubjectTerms Approximation algorithms
Convergence
Government
Monte Carlo methods
Optimization
Perturbation methods
Stochastic processes
Title Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning
URI https://ieeexplore.ieee.org/document/9029247
WOSCitedRecordID wos000560779004129&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FUNZgLaIb3lgxK3jJE48opaKAZXyJXWr3PgMWRLUpoifj52EFiQWtshSZOls-d2z770DuNTSAbkvqbQ7hgZGIY3jMKEahYUjpj2JpYnrXTSZxLOZnDbgaqOFQcSy-Az77rN8y9d5snZXZQPJuKULUROaUSQqrdb3qWtxNvBrhY7H5GA4GgYOf1zxluzXP_7qoFICyHjvf1PvQ2-rxCPTDcYcQAOzDuz-MBHsQHvbLrcLo4e1WqX0qciTN-UsmMm1Mw3_TCuFIlGZJvfG0MoOmDxiaZyalHeEpPZafe3By_jmeXhL60YJNOXML6hvpDAhY4onoYqUj5HmyjM2NRGxXEiBliYFRhqOkcXjMFbOFV4YlDa9MUwY_xBaWZ7hEZCQ-VwtLIsIdRhg7CnLpxYeR8dkbW4ljqHrgjN_r7ww5nVcTv4ePoW2i39V_nEGrWK5xnPYST6KdLW8KBfwC0_jmms
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmogXFTB-uwePFrrb_erRgAQjIiom3EjZnepedg0sxp9v211BEy_emiZNk7bpm9fOewNwGXMN5IwTrk4McaVAEoZeRGL0FRzR2OZoTFwHwXAYTiZ8VIGrlRYGEU3yGbZ00_zlx1m01E9lbU4dRReCDdjUlbNKtdb3vauQ1mWlRsemvN3pdlyNQDp9i7fKob9qqBgI6e3-b_I9aK61eNZohTL7UMG0Djs_bATrUFsXzG1A93EpFgl5zrPoTWgTZuta24Z_JoVG0RJpbD1ISQpDYOsJjXVqZF4JrdJt9bUJL72bcadPylIJJHEoywmT3JcepcKJPBEIhkHsCFuq4MQP-Yz7qIiSK7l0MFCI7IVC-8L7ErkKcCT1JTuAapqleAiWR5kjZopHeLHnYmgLxahmtoOay6royj-Chl6c6XvhhjEt1-X47-4L2O6P7wfTwe3w7gRqei-KZJBTqObzJZ7BVvSRJ4v5udnML401nbQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+IEEE+Conference+on+Decision+%26+Control&rft.atitle=Quasi-Stochastic+Approximation+and+Off-Policy+Reinforcement+Learning&rft.au=Bernstein%2C+Andrey&rft.au=Chen%2C+Yue&rft.au=Colombino%2C+Marcello&rft.au=Dall%27Anese%2C+Emiliano&rft.date=2019-12-01&rft.pub=IEEE&rft.eissn=2576-2370&rft.spage=5244&rft.epage=5251&rft_id=info:doi/10.1109%2FCDC40024.2019.9029247&rft.externalDocID=9029247