High-Dimensional Nonconvex Stochastic Optimization by Doubly Stochastic Successive Convex Approximation

In this paper, we consider supervised learning problems over training sets in which the number of training examples and the dimension of feature vectors are both large. We focus on the case where the loss function defining the quality of the parameter we wish to estimate may be non-convex, but also...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on signal processing Ročník 68; s. 6287 - 6302
Hlavní autori: Mokhtari, Aryan, Koppel, Alec
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:1053-587X, 1941-0476
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In this paper, we consider supervised learning problems over training sets in which the number of training examples and the dimension of feature vectors are both large. We focus on the case where the loss function defining the quality of the parameter we wish to estimate may be non-convex, but also has a convex regularization. We propose a Doubly Stochastic Successive Convex approximation scheme (DSSC) able to handle non-convex regularized expected risk minimization. The method operates by decomposing the decision variable into blocks and operating on random subsets of blocks at each step (fusing the merits of stochastic approximation with block coordinate methods), and then implements successive convex approximation. In contrast to many stochastic convex methods whose almost sure behavior is not guaranteed in non-convex settings, DSSC attains almost sure convergence to a stationary solution of the problem. Moreover, we show that the proposed DSSC algorithm achieves stationarity at a rate of <inline-formula><tex-math notation="LaTeX">{\mathcal O}((\log t)/{t^{1/4}})</tex-math></inline-formula>. Numerical experiments on a non-convex variant of a lasso regression problem show that DSSC performs favorably in this setting. We then apply this method to the task of dictionary learning from high-dimensional visual data collected from a ground robot, and observe reliable convergence behavior for a difficult non-convex stochastic program.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1053-587X
1941-0476
DOI:10.1109/TSP.2020.3033354