Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points
In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding. To handle constrained optimization, we first propose generalizations...
Uloženo v:
| Vydáno v: | Foundations of computational mathematics Ročník 22; číslo 1; s. 35 - 76 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
Springer US
01.02.2022
Springer Nature B.V |
| Témata: | |
| ISSN: | 1615-3375, 1615-3383 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To facilitate zeroth-order optimization in high dimensions, we explore the advantages of structural sparsity assumptions. Specifically, (i) we highlight an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step size and (ii) propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. We next focus on avoiding saddle points in nonconvex setting. Toward that, we interpret the Gaussian smoothing technique for estimating gradient based on zeroth-order information as an instantiation of first-order Stein’s identity. Based on this, we provide a novel linear-(in dimension) time estimator of the Hessian matrix of a function using only zeroth-order information, which is based on second-order Stein’s identity. We then provide a zeroth-order variant of cubic regularized Newton method for avoiding saddle points and discuss its rate of convergence to local minima. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1615-3375 1615-3383 |
| DOI: | 10.1007/s10208-021-09499-8 |