CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems
We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which...
Saved in:
| Published in: | Proceedings - IEEE International Parallel and Distributed Processing Symposium pp. 847 - 859 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.05.2015
|
| Subjects: | |
| ISSN: | 1530-2075 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel is efficiency of a state-of-the-art implementation scaled as W = Omega(P 3 ), where W is the problem size and P the number of processors, this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P 2 ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the is efficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six real world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at https://github.com/fastalgo/casvm. |
|---|---|
| AbstractList | We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel is efficiency of a state-of-the-art implementation scaled as W = Omega(P 3 ), where W is the problem size and P the number of processors, this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P 2 ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the is efficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six real world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at https://github.com/fastalgo/casvm. |
| Author | Le Song Yang You Demmel, James Vuduc, Richard Czechowski, Kenneth |
| Author_xml | – sequence: 1 surname: Yang You fullname: Yang You email: you-y12@mails.tsinghua.edu.cn organization: Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China – sequence: 2 givenname: James surname: Demmel fullname: Demmel, James email: demmel@berkeley.edu organization: Comput. Sci. Div., Univ. of California at Berkeley, Berkeley, CA, USA – sequence: 3 givenname: Kenneth surname: Czechowski fullname: Czechowski, Kenneth email: kentcz@gatech.edu organization: Georgia Inst. of Technol., Coll. of Comput., Atlanta, GA, USA – sequence: 4 surname: Le Song fullname: Le Song email: lsong@gatech.edu organization: Comput. Sci. Div., Univ. of California at Berkeley, Berkeley, CA, USA – sequence: 5 givenname: Richard surname: Vuduc fullname: Vuduc, Richard email: richie@gatech.edu organization: Georgia Inst. of Technol., Coll. of Comput., Atlanta, GA, USA |
| BookMark | eNotj11LwzAYhSNMcJu79cab_IHON0mTtN6Nzo_BhoPqbkfbvNGATUaTCfv3FvTmHB4OPHBmZOKDR0LuGCwZg_Jhs1_v6yUHJkfWV2TGcl2WhcpLNSFTJgVkHLS8IYsYXQtc6XHiakp21SqrD7tHWoW-P3vXNckFn61-gjPOf9L6fDqFIdEDdikMdNd0X85jpMHTtYtpcO05oaH1JSbs4y25ts13xMV_z8nH89N79Zpt31421WqbOQ5FyqzgvDBCKAassS3neSGFMCW2uUXoZKs6tMp0IJk0YLXV4xlAYwWOYUHMyf2f1yHi8TS4vhkuR80Uk5qJX6bxUB0 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPS.2015.117 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1479986496 9781479986491 |
| EndPage | 859 |
| ExternalDocumentID | 7161571 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-i208t-f3228d336101afb2248533d9eb4fe0c5b6cef6dc0515d0f7f77990edf3eedff03 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 31 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380545200082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1530-2075 |
| IngestDate | Wed Aug 27 01:42:48 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i208t-f3228d336101afb2248533d9eb4fe0c5b6cef6dc0515d0f7f77990edf3eedff03 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_7161571 |
| PublicationCentury | 2000 |
| PublicationDate | 20150501 |
| PublicationDateYYYYMMDD | 2015-05-01 |
| PublicationDate_xml | – month: 05 year: 2015 text: 20150501 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings - IEEE International Parallel and Distributed Processing Symposium |
| PublicationTitleAbbrev | IPDPS |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib026764926 ssj0020349 |
| Score | 1.7686182 |
| Snippet | We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 847 |
| SubjectTerms | Accuracy communication avoidance distributed memory algorithms Kernel Mathematical model Partitioning algorithms Program processors statistical machine learning Support vector machines Training |
| Title | CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems |
| URI | https://ieeexplore.ieee.org/document/7161571 |
| WOSCitedRecordID | wos000380545200082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zePA0dRO_ycGjcV2bNo23MR0KbhSmY7fRJu_BLq3s6-83r-0mAy_eQiAQXhLeR97v92PsQfuZMTJG4TIvFFKiLwjvKHxp0TnYLE1LtYbphxqP49lMJw32uMfCAEDZfAZPNCz_8m1hNlQq6yoKTwgwfqSUqrBau7vjRyoi7rt9skW8KxVXquduggprwsaep7vvyUsyoa6ukL4sD2RVSq8ybP1vP6es8wvP48ne8ZyxBuTnrLXTZ-D1c22z0aAvJtPRMz9AgYj-tljQQk6Kni765tOycs9HZV8lrHiR8xfi0yUpLLC8JjXvsK_h6-fgTdTyCWLhe_FaoHursQ0CFyD1Usx8Ii8LAqshkwieCbPIAEbWkMqL9VChUs41gcXAbR_RCy5YMy9yuGQ8lTHoACN34pHsZXEcSiJ-A1TaWK3wirXJOvPviiFjXhvm-u_pG3ZCxq_aBm9Zc73cwB07Ntv1YrW8L4_1B8qWoa4 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEA5iC-3Jtlr6bg49NnUf2c1ub6IVpa4saMWb7CYZ8LJbfP3-ZnZXi9BLbyEQCDMJM5PM932EvIROKiUPgJnKCxjn4DDEOzKHKzABNk2SQq1hNhLjcTCfh3GNvB6wMFrrovlMv-Gw-MtXudziU1lbYHqCgPETj3PHLtFa-9Pj-MJH9rtDuYXMKyVbqmXOgvAqykbbCtvDuBdPsK_Lw0_LI2GVIq70G__b0QVp_QL0aHwIPZekprMr0tgrNNDqwjZJ1O2wySx6p0c4ENbZ5UtcSFHT0-TfdFa83dOo6KzUa5pntIeMuiiGpRWtaM1b5Kv_Me0OWCWgwJaOFWwYmNsaKNc1KZKdQOogfZnrqlCnHLQlvdSXGnwlUedFWSBACBOctALXbB_Acq9JPcszfUNowgMduuAbn_vcToPA40j9pkGEUoUCbkkTrbP4LjkyFpVh7v6efiZng2k0WoyG4897co6OKJsIH0h9s9rqR3Iqd5vlevVUuPgHZEak9Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=CA-SVM%3A+Communication-Avoiding+Support+Vector+Machines+on+Distributed+Systems&rft.au=Yang+You&rft.au=Demmel%2C+James&rft.au=Czechowski%2C+Kenneth&rft.au=Le+Song&rft.date=2015-05-01&rft.pub=IEEE&rft.issn=1530-2075&rft.spage=847&rft.epage=859&rft_id=info:doi/10.1109%2FIPDPS.2015.117&rft.externalDocID=7161571 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon |