Doubly stochastic algorithms for large-scale optimization

We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2016 50th Asilomar Conference on Signals, Systems and Computers s. 1705 - 1709
Hlavní autoři:	Koppel, Alec, Mokhtari, Aryan, Ribeiro, Alejandro
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.11.2016
Témata:	Approximation algorithms Convergence Delays Indexes Program processors Radio frequency Training
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple processors to operate in a randomly chosen subset of blocks of the feature vector. We call the algorithm stochastic because processors choose elements of the training set randomly and independently. Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both, the selection of blocks and the selection of elements of the training set. In RAPSA, processors utilize the randomly chosen functions to compute the stochastic gradient component associated with a randomly chosen block. We show that this type of doubly stochastic approximation method, when executed on an asynchronous parallel computing architecture, exhibits comparable convergence behavior to that of classical stochastic gradient descent on strongly convex functions - for diminishing step-sizes, asynchronous RAPSA converges to the minimizer of the expected risk. We illustrate empirical algorithm performance on a linear estimation problem, as well as a binary image classification using the MNIST handwritten digit dataset.
DOI:	10.1109/ACSSC.2016.7869673