Doubly stochastic algorithms for large-scale optimization
We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple...
Saved in:
| Published in: | 2016 50th Asilomar Conference on Signals, Systems and Computers pp. 1705 - 1709 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.11.2016
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple processors to operate in a randomly chosen subset of blocks of the feature vector. We call the algorithm stochastic because processors choose elements of the training set randomly and independently. Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both, the selection of blocks and the selection of elements of the training set. In RAPSA, processors utilize the randomly chosen functions to compute the stochastic gradient component associated with a randomly chosen block. We show that this type of doubly stochastic approximation method, when executed on an asynchronous parallel computing architecture, exhibits comparable convergence behavior to that of classical stochastic gradient descent on strongly convex functions - for diminishing step-sizes, asynchronous RAPSA converges to the minimizer of the expected risk. We illustrate empirical algorithm performance on a linear estimation problem, as well as a binary image classification using the MNIST handwritten digit dataset. |
|---|---|
| DOI: | 10.1109/ACSSC.2016.7869673 |