Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines

We consider the problem of how to design and implement communication-efficient versions of parallel kernel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, i...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on parallel and distributed systems Ročník 28; číslo 4; s. 974 - 988
Hlavní autoři: You, Yang, Demmel, James, Czechowski, Kent, Song, Le, Vuduc, Rich
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.04.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1045-9219, 1558-2183
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We consider the problem of how to design and implement communication-efficient versions of parallel kernel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Ω(P 3 ), where W is the problem size and P the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Ω(P 2 ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM method that improves the isoefficiency to nearly W = Ω(P). We evaluate these methods on 96 to 1,536 processors, and show average speedups of 3 - 16x (7× on average) over Dis-SMO, and a 95 percent weak-scaling efficiency on six real-world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1].
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
USDOE Office of Science (SC)
SC0008700; SC0010200; AC02-05CH11231
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2016.2608823