High-Dimensional Nonconvex Stochastic Optimization by Doubly Stochastic Successive Convex Approximation

In this paper, we consider supervised learning problems over training sets in which the number of training examples and the dimension of feature vectors are both large. We focus on the case where the loss function defining the quality of the parameter we wish to estimate may be non-convex, but also...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on signal processing Jg. 68; S. 6287 - 6302
Hauptverfasser:	Mokhtari, Aryan, Koppel, Alec
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Approximation Approximation algorithms Convergence Convex functions Optimization Optimization algorithms Parameter estimation Regularization Risk management Signal processing algorithms stochastic optimization successive convex approximation Supervised learning Training Visual observation
ISSN:	1053-587X, 1941-0476
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we consider supervised learning problems over training sets in which the number of training examples and the dimension of feature vectors are both large. We focus on the case where the loss function defining the quality of the parameter we wish to estimate may be non-convex, but also has a convex regularization. We propose a Doubly Stochastic Successive Convex approximation scheme (DSSC) able to handle non-convex regularized expected risk minimization. The method operates by decomposing the decision variable into blocks and operating on random subsets of blocks at each step (fusing the merits of stochastic approximation with block coordinate methods), and then implements successive convex approximation. In contrast to many stochastic convex methods whose almost sure behavior is not guaranteed in non-convex settings, DSSC attains almost sure convergence to a stationary solution of the problem. Moreover, we show that the proposed DSSC algorithm achieves stationarity at a rate of <inline-formula><tex-math notation="LaTeX">{\mathcal O}((\log t)/{t^{1/4}})</tex-math></inline-formula>. Numerical experiments on a non-convex variant of a lasso regression problem show that DSSC performs favorably in this setting. We then apply this method to the task of dictionary learning from high-dimensional visual data collected from a ground robot, and observe reliable convergence behavior for a difficult non-convex stochastic program.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2020.3033354