Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support
In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as discrete distributions. D2-clustering pursues the minimum total within-cluster variation for a set of discrete distributions subject to the Kantorov...
Saved in:
| Published in: | IEEE transactions on signal processing Vol. 65; no. 9; pp. 2317 - 2332 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
IEEE
01.05.2017
|
| Subjects: | |
| ISSN: | 1053-587X, 1941-0476 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as discrete distributions. D2-clustering pursues the minimum total within-cluster variation for a set of discrete distributions subject to the Kantorovich-Wasserstein metric. D2-clustering has a severe scalability issue, the bottleneck being the computation of a centroid distribution, called Wasserstein barycenter, that minimizes its sum of squared distances to the cluster members. In this paper, we develop a modified Bregman ADMM approach for computing the approximate discrete Wasserstein barycenter of large clusters. In the case when the support points of the barycenters are unknown and have low cardinality, our method achieves high accuracy empirically at a much reduced computational cost. The strengths and weaknesses of our method and its alternatives are examined through experiments, and we recommend scenarios for their respective usage. Moreover, we develop both serial and parallelized versions of the algorithm. By experimenting with large-scale data, we demonstrate the computational efficiency of the new methods and investigate their convergence properties and numerical stability. The clustering results obtained on several datasets in different domains are highly competitive in comparison with some widely used methods in the corresponding areas. |
|---|---|
| AbstractList | In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as discrete distributions. D2-clustering pursues the minimum total within-cluster variation for a set of discrete distributions subject to the Kantorovich-Wasserstein metric. D2-clustering has a severe scalability issue, the bottleneck being the computation of a centroid distribution, called Wasserstein barycenter, that minimizes its sum of squared distances to the cluster members. In this paper, we develop a modified Bregman ADMM approach for computing the approximate discrete Wasserstein barycenter of large clusters. In the case when the support points of the barycenters are unknown and have low cardinality, our method achieves high accuracy empirically at a much reduced computational cost. The strengths and weaknesses of our method and its alternatives are examined through experiments, and we recommend scenarios for their respective usage. Moreover, we develop both serial and parallelized versions of the algorithm. By experimenting with large-scale data, we demonstrate the computational efficiency of the new methods and investigate their convergence properties and numerical stability. The clustering results obtained on several datasets in different domains are highly competitive in comparison with some widely used methods in the corresponding areas. |
| Author | Panruo Wu Jia Li Wang, James Z. Jianbo Ye |
| Author_xml | – sequence: 1 surname: Jianbo Ye fullname: Jianbo Ye email: jxy198@ist.psu.edu organization: Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA, USA – sequence: 2 surname: Panruo Wu fullname: Panruo Wu email: pwu011@cs.ucr.edu organization: Dept. of Comput. Sci. & Eng., Univ. of California, Riverside, Riverside, CA, USA – sequence: 3 givenname: James Z. surname: Wang fullname: Wang, James Z. email: jwang@ist.psu.edu organization: Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA, USA – sequence: 4 surname: Jia Li fullname: Jia Li email: jiali@stat.psu.edu organization: Dept. of Stat., Pennsylvania State Univ., University Park, PA, USA |
| BookMark | eNp9kE1LAzEQhoNUsK3eBS_5A1sns9mkOWq1KhQUWq23JY1TjdTdJcke_Pfu0uLBg5eZlxme-XhHbFDVFTF2LmAiBJjL1fJpgiD0BFVhlNRHbCiMFBlIrQadhiLPiql-PWGjGD8BhJRGDdnL3MbEb3x0gRL1IgW_aZOvKz7btTFR8NU7f459XNsYKXQ1X_FrG74dVV2fr3364MvGhkh82TZNHdIpO97aXaSzQx6z1fx2NbvPFo93D7OrReZQ5SmTErWDQm1EYQwibSWiRqXQ6gLAoFAg7FTpQgJJ45x0ZN-cQUKyStt8zGA_1oU6xkDbsgn-q7usFFD2tpSdLWVvS3mwpUPUH8T5ZPt_U7B-9x94sQc9Ef3u0dM8R5D5D_VFckQ |
| CODEN | ITPRED |
| CitedBy_id | crossref_primary_10_1145_3688800 crossref_primary_10_1016_j_spl_2024_110070 crossref_primary_10_1214_20_AOS1987 crossref_primary_10_1007_s11042_020_09797_3 crossref_primary_10_1109_TGRS_2018_2873966 crossref_primary_10_1109_TIM_2021_3050173 crossref_primary_10_1109_TSP_2020_3046227 crossref_primary_10_1137_17M1140431 crossref_primary_10_1007_s00245_022_09911_x crossref_primary_10_1007_s12351_020_00589_z crossref_primary_10_1007_s40314_020_01395_1 crossref_primary_10_1287_ijoo_2019_0020 crossref_primary_10_1109_TII_2024_3366996 crossref_primary_10_1109_TCE_2023_3300890 crossref_primary_10_1109_TGRS_2020_2984703 crossref_primary_10_1109_TPAMI_2024_3363780 crossref_primary_10_1109_ACCESS_2021_3072613 crossref_primary_10_1109_TPAMI_2022_3153126 crossref_primary_10_1080_10618600_2018_1448831 crossref_primary_10_1007_s10472_022_09807_0 crossref_primary_10_1109_TPAMI_2023_3314661 crossref_primary_10_1007_s10589_023_00458_3 crossref_primary_10_1111_biom_13630 crossref_primary_10_1109_TPAMI_2019_2908635 crossref_primary_10_1002_sta4_465 crossref_primary_10_1016_j_jmva_2019_104581 crossref_primary_10_1016_j_neunet_2024_106420 crossref_primary_10_1109_TKDE_2024_3386401 crossref_primary_10_1109_TNNLS_2025_3551275 crossref_primary_10_1109_TPAMI_2021_3050750 crossref_primary_10_1016_j_neunet_2024_107038 crossref_primary_10_1109_ACCESS_2025_3542360 |
| Cites_doi | 10.1109/LSP.2015.2410217 10.1109/CVPR.2007.383188 10.2307/2284239 10.1109/TPAMI.2007.70847 10.1137/141000439 10.1109/ICIP.2014.7026066 10.1137/15M1032600 10.1137/100805741 10.1287/opre.41.2.338 10.1561/2200000050 10.1214/aoms/1177692631 10.1145/2766963 10.1007/BF01908075 10.1023/A:1026543900054 10.1016/0041-5553(67)90040-7 10.3115/v1/P15-4023 10.14778/2732240.2732249 10.1016/j.ins.2007.02.045 10.1145/1143844.1143889 10.1137/S1052623403425629 10.1145/2700293 10.1137/1129093 10.3115/v1/D14-1162 10.1007/s00186-016-0549-x 10.1016/S0167-6377(02)00231-6 10.1007/BF00940051 10.1145/1143844.1143892 10.1287/moor.18.1.202 10.1051/m2an/2015033 10.1561/2200000016 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TSP.2017.2659647 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1941-0476 |
| EndPage | 2332 |
| ExternalDocumentID | 10_1109_TSP_2017_2659647 7833204 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation (NSF) grantid: CCF-0936948; ACI-1027854; DMS-1521092 funderid: 10.13039/100000001 – fundername: NSF grantid: ACI-0821527 (CyberStar); ACI-1053575 (XSEDE) funderid: 10.13039/100000001 |
| GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 85S 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK ACNCT AENEX AGQYO AHBIQ AJQPL AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI MS~ O9- OCL P2P RIA RIE RNS TAE TN5 AAYXX CITATION |
| ID | FETCH-LOGICAL-c263t-4427c056b159922ef42272662a7500921601a867540e49cc4ceadc92e2ea67a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 85 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000395877600010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1053-587X |
| IngestDate | Tue Nov 18 20:53:12 EST 2025 Sat Nov 29 04:10:42 EST 2025 Tue Aug 26 17:00:18 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c263t-4427c056b159922ef42272662a7500921601a867540e49cc4ceadc92e2ea67a3 |
| PageCount | 16 |
| ParticipantIDs | ieee_primary_7833204 crossref_primary_10_1109_TSP_2017_2659647 crossref_citationtrail_10_1109_TSP_2017_2659647 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-May1,-1 2017-5-1 |
| PublicationDateYYYYMMDD | 2017-05-01 |
| PublicationDate_xml | – month: 05 year: 2017 text: 2017-May1,-1 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE transactions on signal processing |
| PublicationTitleAbbrev | TSP |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | cuturi (ref12) 2014 chizat (ref33) 2016 ref37 ref15 ref36 ref14 ref31 ref30 ref32 schmitzer (ref34) 2016 ref1 ref17 villani (ref2) 2003 ref19 ref18 mikolov (ref44) 2013 vinh (ref45) 2010; 11 rabin (ref23) 2011 ref46 ref24 kusner (ref42) 2015 ref26 ref47 ref25 ref20 ref41 pele (ref10) 2009 ref21 ref43 villani (ref7) 2008; 338 cuturi (ref11) 2013 ref28 hoffman (ref48) 2010 elkan (ref35) 2003; 3 ref27 ref29 solomon (ref22) 2015; 34 ref8 schuhmacher (ref5) 2008 ref9 ref4 ref3 ref6 rosenberg (ref38) 2007; 7 wang (ref16) 2014 ref40 banerjee (ref13) 2005; 6 arthur (ref39) 2007 |
| References_xml | – ident: ref6 doi: 10.1109/LSP.2015.2410217 – ident: ref29 doi: 10.1109/CVPR.2007.383188 – start-page: 3111 year: 2013 ident: ref44 article-title: Distributed representations of words and phrases and their compositionality publication-title: Proc Adv Neural Inf Process Syst – start-page: 2292 year: 2013 ident: ref11 article-title: Sinkhorn distances: Lightspeed computation of optimal transport publication-title: Proc Adv Neural Inf Process Syst – ident: ref46 doi: 10.2307/2284239 – ident: ref9 doi: 10.1109/TPAMI.2007.70847 – ident: ref19 doi: 10.1137/141000439 – ident: ref14 doi: 10.1109/ICIP.2014.7026066 – start-page: 1 year: 2008 ident: ref5 article-title: On performance evaluation of multi-object filters publication-title: 2008 11th International Conference on Information Fusion FUSION – ident: ref21 doi: 10.1137/15M1032600 – volume: 3 start-page: 147 year: 2003 ident: ref35 article-title: Using the triangle inequality to accelerate k-means publication-title: Proc Int Conf Mach Learn – ident: ref17 doi: 10.1137/100805741 – start-page: 957 year: 2015 ident: ref42 article-title: From word embeddings to document distances publication-title: Proc Int Conf Mach Learn – volume: 7 start-page: 410 year: 2007 ident: ref38 article-title: V-measure: A conditional entropy-based external cluster evaluation measure publication-title: Proc Joint Conf EMNLP-CoNLL – ident: ref8 doi: 10.1287/opre.41.2.338 – ident: ref32 doi: 10.1561/2200000050 – ident: ref3 doi: 10.1214/aoms/1177692631 – volume: 34 start-page: 66:1 year: 2015 ident: ref22 article-title: Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains publication-title: ACM Trans Graph doi: 10.1145/2766963 – year: 2016 ident: ref33 article-title: Scaling algorithms for unbalanced transport problems publication-title: arXiv 1607 05816 – start-page: 685 year: 2014 ident: ref12 article-title: Fast computation of Wasserstein barycenters publication-title: Proc Int Conf Mach Learn – ident: ref47 doi: 10.1007/BF01908075 – ident: ref4 doi: 10.1023/A:1026543900054 – ident: ref26 doi: 10.1016/0041-5553(67)90040-7 – ident: ref37 doi: 10.3115/v1/P15-4023 – year: 2016 ident: ref34 article-title: Stabilized sparse scaling algorithms for entropy regularized transport problems publication-title: arXiv 1610 06519 – ident: ref25 doi: 10.14778/2732240.2732249 – volume: 338 year: 2008 ident: ref7 publication-title: Optimal Transport Old and New – year: 2003 ident: ref2 publication-title: Topics in Optimal Transportation – ident: ref41 doi: 10.1016/j.ins.2007.02.045 – start-page: 1027 year: 2007 ident: ref39 article-title: k-means++: The advantages of careful seeding publication-title: Proc 18th Annu ACM-SIAM Symp Discr Algorithms – start-page: 435 year: 2011 ident: ref23 article-title: Wasserstein barycenter and its application to texture mixing publication-title: Scale Space and Variational Methods in Computer Vision – ident: ref40 doi: 10.1145/1143844.1143889 – volume: 6 start-page: 1705 year: 2005 ident: ref13 article-title: Clustering with Bregman divergences publication-title: J Mach Learn Res – ident: ref30 doi: 10.1137/S1052623403425629 – ident: ref15 doi: 10.1145/2700293 – ident: ref1 doi: 10.1137/1129093 – ident: ref43 doi: 10.3115/v1/D14-1162 – ident: ref18 doi: 10.1007/s00186-016-0549-x – start-page: 856 year: 2010 ident: ref48 article-title: Online learning for latent Dirichlet allocation publication-title: Proc Adv Neural Inf Process Syst – start-page: 2816 year: 2014 ident: ref16 article-title: Bregman alternating direction method of multipliers publication-title: Proc Adv Neural Inf Process Syst – volume: 11 start-page: 2837 year: 2010 ident: ref45 article-title: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance publication-title: J Mach Learn Res – ident: ref31 doi: 10.1016/S0167-6377(02)00231-6 – ident: ref27 doi: 10.1007/BF00940051 – ident: ref36 doi: 10.1145/1143844.1143892 – ident: ref28 doi: 10.1287/moor.18.1.202 – ident: ref20 doi: 10.1051/m2an/2015033 – ident: ref24 doi: 10.1561/2200000016 – start-page: 460 year: 2009 ident: ref10 article-title: Fast and robust Earth mover's distances publication-title: Proc Int Conf Comp Vis |
| SSID | ssj0014496 |
| Score | 2.5902288 |
| Snippet | In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as... |
| SourceID | crossref ieee |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 2317 |
| SubjectTerms | ADMM clustering Clustering algorithms Discrete distribution Electronic mail Histograms K-means large-scale learning Optimization parallel computing Prototypes Signal processing algorithms Time complexity |
| Title | Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support |
| URI | https://ieeexplore.ieee.org/document/7833204 |
| Volume | 65 |
| WOSCitedRecordID | wos000395877600010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1941-0476 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014496 issn: 1053-587X databaseCode: RIE dateStart: 19910101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEA5WemgPfdlS-yKHXgqNrkl2kxz7kp5EUKq3JcZIBVHRtb-_M9l1sVAKvSxhSSDMhOSbZOb7CLm3Qo-0UZzB5m-YdKbFtI5jpsaJs9Jzr6JcbEJ1Ono4NN0KeSxrYbz3IfnMN7AZ3vLHC7fBq7Km0kJwJP_cUyrJa7XKFwMpgxYXwAXBYq2G2yfJyDT7vS7mcKkGT2IsvPxxBO1oqoQjpX38v8mckKMCOtKn3NenpOLnZ-Rwh1CwRj7adp3R1ynsBQCGsVEKWtGX2QZJEaAfDXkCdGBDqSXKXdJnvMlBes4VHUyzT9pbQsDrKWp-Aj4_J_32W__lnRXKCczxRGRMSq4cQJsRgBXDuZ9IzhV4g1sACJHhLQjDrIZYQUZeGuekgwXlDAff2ERZcUGq88XcXxKqtWhZ6cB_3shJIix88Fg3SGqT8KhOmltbpq5gFUdxi1kaoovIpGD9FK2fFtavk4dyxDJn1Pijbw0NX_YrbH71--9rcoCD83TEG1LNVht_S_bdVzZdr-7CcvkGJn-7hg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1ba8IwFA7iBtsednNj7pqHvQwWrWnaJI-7iWNOBGX6VmKMTBAVrfv9O6etxcEY7KWEkpTynZB8JznnfITcGl8NlJacweKvmbC6xpQKAiaHoTXCcSe9VGxCtlqq39ftArnPc2Gcc0nwmatgM7nLH87sCo_KqlL5Psfin1uBENxLs7XyOwMhEjUuIAw-C5Tsry8lPV3tdtoYxSUrPAww9fLHJrShqpJsKvWD__3OIdnPyCN9SK19RApuekz2NkoKlshH3Sxj-jyG1QDoMDZySSv6NFlhWQToR5NIAdozSbIlCl7SRzzLwQKdC9obx5-0MweX11FU_QSGfkK69ZfuU4Nl2gnM8tCPGeAjLZCbAdAVzbkbCc4l2IMboAie5jVwxIwCb0F4TmhrhYUpZTUH65hQGv-UFKezqTsjVCm_ZoQFCzotRqFv4IEbu8ayNiH3yqS6xjKyWV1xlLeYRIl_4ekI0I8Q_ShDv0zu8hHztKbGH31LCHzeL8P8_PfXN2Sn0X1vRs3X1tsF2cUPpcGJl6QYL1buimzbr3i8XFwnU-cbFS6-zQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fast+Discrete+Distribution+Clustering+Using+Wasserstein+Barycenter+With+Sparse+Support&rft.jtitle=IEEE+transactions+on+signal+processing&rft.au=Ye%2C+Jianbo&rft.au=Wu%2C+Panruo&rft.au=Wang%2C+James+Z.&rft.au=Li%2C+Jia&rft.date=2017-05-01&rft.issn=1053-587X&rft.eissn=1941-0476&rft.volume=65&rft.issue=9&rft.spage=2317&rft.epage=2332&rft_id=info:doi/10.1109%2FTSP.2017.2659647&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TSP_2017_2659647 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1053-587X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1053-587X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1053-587X&client=summon |