CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - IEEE International Parallel and Distributed Processing Symposium S. 847 - 859
Hauptverfasser: Yang You, Demmel, James, Czechowski, Kenneth, Le Song, Vuduc, Richard
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.05.2015
Schlagworte:
ISSN:1530-2075
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel is efficiency of a state-of-the-art implementation scaled as W = Omega(P 3 ), where W is the problem size and P the number of processors, this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P 2 ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the is efficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six real world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at https://github.com/fastalgo/casvm.
AbstractList We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel is efficiency of a state-of-the-art implementation scaled as W = Omega(P 3 ), where W is the problem size and P the number of processors, this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P 2 ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the is efficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six real world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at https://github.com/fastalgo/casvm.
Author Le Song
Yang You
Demmel, James
Vuduc, Richard
Czechowski, Kenneth
Author_xml – sequence: 1
  surname: Yang You
  fullname: Yang You
  email: you-y12@mails.tsinghua.edu.cn
  organization: Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
– sequence: 2
  givenname: James
  surname: Demmel
  fullname: Demmel, James
  email: demmel@berkeley.edu
  organization: Comput. Sci. Div., Univ. of California at Berkeley, Berkeley, CA, USA
– sequence: 3
  givenname: Kenneth
  surname: Czechowski
  fullname: Czechowski, Kenneth
  email: kentcz@gatech.edu
  organization: Georgia Inst. of Technol., Coll. of Comput., Atlanta, GA, USA
– sequence: 4
  surname: Le Song
  fullname: Le Song
  email: lsong@gatech.edu
  organization: Comput. Sci. Div., Univ. of California at Berkeley, Berkeley, CA, USA
– sequence: 5
  givenname: Richard
  surname: Vuduc
  fullname: Vuduc, Richard
  email: richie@gatech.edu
  organization: Georgia Inst. of Technol., Coll. of Comput., Atlanta, GA, USA
BookMark eNotj11LwzAYhSNMcJu79cab_IHON0mTtN6Nzo_BhoPqbkfbvNGATUaTCfv3FvTmHB4OPHBmZOKDR0LuGCwZg_Jhs1_v6yUHJkfWV2TGcl2WhcpLNSFTJgVkHLS8IYsYXQtc6XHiakp21SqrD7tHWoW-P3vXNckFn61-gjPOf9L6fDqFIdEDdikMdNd0X85jpMHTtYtpcO05oaH1JSbs4y25ts13xMV_z8nH89N79Zpt31421WqbOQ5FyqzgvDBCKAassS3neSGFMCW2uUXoZKs6tMp0IJk0YLXV4xlAYwWOYUHMyf2f1yHi8TS4vhkuR80Uk5qJX6bxUB0
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS.2015.117
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1479986496
9781479986491
EndPage 859
ExternalDocumentID 7161571
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i208t-f3228d336101afb2248533d9eb4fe0c5b6cef6dc0515d0f7f77990edf3eedff03
IEDL.DBID RIE
ISICitedReferencesCount 31
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380545200082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1530-2075
IngestDate Wed Aug 27 01:42:48 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i208t-f3228d336101afb2248533d9eb4fe0c5b6cef6dc0515d0f7f77990edf3eedff03
PageCount 13
ParticipantIDs ieee_primary_7161571
PublicationCentury 2000
PublicationDate 20150501
PublicationDateYYYYMMDD 2015-05-01
PublicationDate_xml – month: 05
  year: 2015
  text: 20150501
  day: 01
PublicationDecade 2010
PublicationTitle Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev IPDPS
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026764926
ssj0020349
Score 1.7686182
Snippet We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in...
SourceID ieee
SourceType Publisher
StartPage 847
SubjectTerms Accuracy
communication avoidance
distributed memory algorithms
Kernel
Mathematical model
Partitioning algorithms
Program processors
statistical machine learning
Support vector machines
Training
Title CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems
URI https://ieeexplore.ieee.org/document/7161571
WOSCitedRecordID wos000380545200082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La8JAEB6s9NCTbbX0zR567NaY1ya9iVbagxKwFW-y2Qd4SYqv39-ZTbQIvfQWAoFlMsvMtzvf9wE8Id6SUapjHks_5KHFNJax9jlWWpXa3LdJmDuzCTGZJPN5mjXg-cCFMca44TPzQo_uLl-XaktHZV1B7QkRxk-EEBVXa587fixi0r47gC3SXam0Uj3MBBHVgo09L-1-ZMNsSlNdEV1ZHtmquKoyav1vPefQ-aXnsexQeC6gYYpLaO39GVi9XdswHvT5dDZ-ZUcsEN7flUv6kJGjJ3bfbOZO7tnYzVWaNSsLNiQ9XbLCMprVouYd-Bq9fQ7eeW2fwJe-l2y4xb2a6CDABqknMewkXhYEOjV5aI2nojxWxsZakcuL9qywQmBpMtoGuHxrveAKmkVZmGtgiGpSrPvUHuRhpBBlSUReMpfYXcpQqxtoU3QW35VCxqIOzO3fr-_gjIJfjQ3eQ3Oz2poHOFW7zXK9enS_9Qd4-6D4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEB7EFtqTbbX03Rx6bOq6m331JlpR6sqCVrxJNg_Yy27x9fub2V0tQi-9hUAgTCbMTDLf9wG8mHqLu6H0qMdtRpk2bsw9aVMTaUWoE1sHLCnEJvzJJFgswrgGrwcsjFKqaD5Tbzgs_vJlLrb4VNb2MT1BwPiJy5jdKdFae--xPd9D9rtDuYXMKyVbqmV8wXcrysaOFbZHcT-eYl-Xi5-WR8IqRVwZNP63owto_QL0SHwIPZdQU9kVNPYKDaS6sE2Iel06nUfv5AgHQru7PMWFBDU9Tf5N5sXbPYmKzkq1JnlG-sioi2JYSpKK1rwFX4OPWW9IKwEFmtpWsKHa3NZAOo5JkTrcGB7pyxxHhiphWlnCTTyhtCcF6rxIS_va901wUlI7ZvtaW8411LM8UzdATF0TmsiPCULCXGHqLG5qL55wk19yJsUtNNE6y--SI2NZGebu7-lnOBvOovFyPJp83sM5HkTZRPgA9c1qqx7hVOw26Xr1VBzxDwoppD8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=CA-SVM%3A+Communication-Avoiding+Support+Vector+Machines+on+Distributed+Systems&rft.au=Yang+You&rft.au=Demmel%2C+James&rft.au=Czechowski%2C+Kenneth&rft.au=Le+Song&rft.date=2015-05-01&rft.pub=IEEE&rft.issn=1530-2075&rft.spage=847&rft.epage=859&rft_id=info:doi/10.1109%2FIPDPS.2015.117&rft.externalDocID=7161571
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon