CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 847 - 859
Hlavní autori: Yang You, Demmel, James, Czechowski, Kenneth, Le Song, Vuduc, Richard
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.05.2015
Predmet:
ISSN:1530-2075
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel is efficiency of a state-of-the-art implementation scaled as W = Omega(P 3 ), where W is the problem size and P the number of processors, this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P 2 ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the is efficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six real world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at https://github.com/fastalgo/casvm.
ISSN:1530-2075
DOI:10.1109/IPDPS.2015.117