High-Dimensional Nonconvex Stochastic Optimization by Doubly Stochastic Successive Convex Approximation
In this paper, we consider supervised learning problems over training sets in which the number of training examples and the dimension of feature vectors are both large. We focus on the case where the loss function defining the quality of the parameter we wish to estimate may be non-convex, but also...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on signal processing Jg. 68; S. 6287 - 6302 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 1053-587X, 1941-0476 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | In this paper, we consider supervised learning problems over training sets in which the number of training examples and the dimension of feature vectors are both large. We focus on the case where the loss function defining the quality of the parameter we wish to estimate may be non-convex, but also has a convex regularization. We propose a Doubly Stochastic Successive Convex approximation scheme (DSSC) able to handle non-convex regularized expected risk minimization. The method operates by decomposing the decision variable into blocks and operating on random subsets of blocks at each step (fusing the merits of stochastic approximation with block coordinate methods), and then implements successive convex approximation. In contrast to many stochastic convex methods whose almost sure behavior is not guaranteed in non-convex settings, DSSC attains almost sure convergence to a stationary solution of the problem. Moreover, we show that the proposed DSSC algorithm achieves stationarity at a rate of <inline-formula><tex-math notation="LaTeX">{\mathcal O}((\log t)/{t^{1/4}})</tex-math></inline-formula>. Numerical experiments on a non-convex variant of a lasso regression problem show that DSSC performs favorably in this setting. We then apply this method to the task of dictionary learning from high-dimensional visual data collected from a ground robot, and observe reliable convergence behavior for a difficult non-convex stochastic program. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1053-587X 1941-0476 |
| DOI: | 10.1109/TSP.2020.3033354 |