Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation

Several decades ago, Profs. Sean Meyn and Lei Guo were postdoctoral fellows at ANU, where they shared interest in recursive algorithms. It seems fitting to celebrate Lei Guo’s 60th birthday with a review of the ODE Method and its recent evolution, with focus on the following themes: The method has b...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of systems science and complexity Ročník 34; číslo 5; s. 1681 - 1702
Hlavní autori: Chen, Shuhang, Devraj, Adithya, Berstein, Andrey, Meyn, Sean
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Beijing Academy of Mathematics and Systems Science, Chinese Academy of Sciences 01.10.2021
Springer Nature B.V
Predmet:
ISSN:1009-6124, 1559-7067
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Several decades ago, Profs. Sean Meyn and Lei Guo were postdoctoral fellows at ANU, where they shared interest in recursive algorithms. It seems fitting to celebrate Lei Guo’s 60th birthday with a review of the ODE Method and its recent evolution, with focus on the following themes: The method has been regarded as a technique for algorithm analysis. It is argued that this viewpoint is backwards: The original stochastic approximation method was surely motivated by an ODE, and tools for analysis came much later (based on establishing robustness of Euler approximations). The paper presents a brief survey of recent research in machine learning that shows the power of algorithm design in continuous time, following by careful approximation to obtain a practical recursive algorithm. While these methods are usually presented in a stochastic setting, this is not a prerequisite. In fact, recent theory shows that rates of convergence can be dramatically accelerated by applying techniques inspired by quasi Monte-Carlo. Subject to conditions, the optimal rate of convergence can be obtained by applying the averaging technique of Polyak and Ruppert. The conditions are not universal, but theory suggests alternatives to achieve acceleration. The theory is illustrated with applications to gradient-free optimization, and policy gradient algorithms for reinforcement learning.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1009-6124
1559-7067
DOI:10.1007/s11424-021-1251-5