Two provably consistent divide-and-conquer clustering algorithms for large networks
In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they...
Saved in:
| Published in: | Proceedings of the National Academy of Sciences - PNAS Vol. 118; no. 44 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
02.11.2021
|
| Subjects: | |
| ISSN: | 1091-6490, 1091-6490 |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings. |
|---|---|
| AbstractList | In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings. In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings. |
| Author | Sarkar, Purnamrita Bickel, Peter J Mukherjee, Soumendu Sundar |
| Author_xml | – sequence: 1 givenname: Soumendu Sundar surname: Mukherjee fullname: Mukherjee, Soumendu Sundar email: soumendu041@gmail.com, bickel@stat.berkeley.edu organization: Department of Statistics, University of California, Berkeley, CA 94720 – sequence: 2 givenname: Purnamrita surname: Sarkar fullname: Sarkar, Purnamrita organization: Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX 78705 – sequence: 3 givenname: Peter J orcidid: 0000-0001-7480-662X surname: Bickel fullname: Bickel, Peter J email: soumendu041@gmail.com, bickel@stat.berkeley.edu organization: Department of Statistics, University of California, Berkeley, CA 94720; soumendu041@gmail.com bickel@stat.berkeley.edu |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34716259$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkEtPwzAQhC1URB9w5oZ85JJiO05SH1HFS6rEgXKOHHsdDIld7KRV_z2WKBKnHe18Ws3sHE2cd4DQNSVLSqr8budkXDJKCF8xSldnaEaJoFnJBZn801M0j_GTECKKFblA05xXtGSFmKG37cHjXfB72XRHrLyLNg7gBqzt3mrIpNNZ2n6PELDqxuQF61osu9YHO3z0ERsfcCdDC9jBcPDhK16icyO7CFenuUDvjw_b9XO2eX16Wd9vMsUrMWRNQWUjgBEqiDJ5QzSYUlCTiikGogKtSyEKUFxK0VBjQDNRGcoryQUvGrZAt793U_4UMA51b6OCrpMO_BjrVJDQvOSMJfTmhI5ND7reBdvLcKz_HsF-AJv5ZAY |
| CitedBy_id | crossref_primary_10_1007_s11222_025_10620_y crossref_primary_10_1080_10618600_2025_2509588 crossref_primary_10_1016_j_csda_2023_107835 crossref_primary_10_1080_10618600_2024_2432974 crossref_primary_10_1145_3657300 crossref_primary_10_1002_sta4_475 |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1073/pnas.2100482118 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 1091-6490 |
| ExternalDocumentID | 34716259 |
| Genre | Research Support, U.S. Gov't, Non-P.H.S Research Support, Non-U.S. Gov't Journal Article |
| GroupedDBID | --- -DZ -~X .55 0R~ 123 29P 2AX 2FS 2WC 4.4 53G 5RE 5VS 85S AACGO AAFWJ AANCE ABBHK ABOCM ABPLY ABPPZ ABTLG ABXSQ ABZEH ACGOD ACIWK ACNCT ACPRK AENEX AEUPB AEXZC AFFNX AFOSN AFRAH ALMA_UNASSIGNED_HOLDINGS BKOMP CS3 D0L DCCCD DIK DU5 E3Z EBS F5P FRP GX1 H13 HH5 HYE IPSME JAAYA JBMMH JENOY JHFFW JKQEH JLS JLXEF JPM JSG JST KQ8 L7B LU7 N9A NPM N~3 O9- OK1 PNE PQQKQ R.V RHI RNA RNS RPM RXW SA0 SJN TAE TN5 UKR W8F WH7 WOQ WOW X7M XSW Y6R YBH YKV YSK ZCA ~02 ~KM 7X8 |
| ID | FETCH-LOGICAL-c479t-b51ab9e20190cf3b0def691f107c2e97edd6995ec4aa9b1ffed297f147a4945b2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 10 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000720888300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1091-6490 |
| IngestDate | Wed Oct 01 14:47:30 EDT 2025 Thu Apr 03 07:07:36 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 44 |
| Keywords | networks clustering divide-and-conquer |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c479t-b51ab9e20190cf3b0def691f107c2e97edd6995ec4aa9b1ffed297f147a4945b2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0001-7480-662X |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/8612351 |
| PMID | 34716259 |
| PQID | 2590136422 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2590136422 pubmed_primary_34716259 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-11-02 |
| PublicationDateYYYYMMDD | 2021-11-02 |
| PublicationDate_xml | – month: 11 year: 2021 text: 2021-11-02 day: 02 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Proceedings of the National Academy of Sciences - PNAS |
| PublicationTitleAlternate | Proc Natl Acad Sci U S A |
| PublicationYear | 2021 |
| SSID | ssj0009580 |
| Score | 2.4901414 |
| Snippet | In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| Title | Two provably consistent divide-and-conquer clustering algorithms for large networks |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/34716259 https://www.proquest.com/docview/2590136422 |
| Volume | 118 |
| WOSCitedRecordID | wos000720888300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3PS8MwFA7qPHhR58_5iwge9BC3JtnSdxIRhxfHwAm7lTRNVKjtXDfF_96XtkMvguClh0Lb8PJe8vXle98j5Axi56TQmiFY1kzagDNtuGFCG9wdLAYUT8pmE2owCMdjGNYJt6KmVS7WxHKhTnLjc-Rt7oskBaJlfjV5Y75rlD9drVtoLJOGQCjjKV1qHP4Q3Q0rNQIIWE9CZyHto0R7kuniknu5tBB_gcLf8WW5z_Q3_jvCTbJeI0x6XblEkyzZbIs06xgu6HktNH2xTR5GHzn1OQUdp5_UeKosznk2o2WNlmU6SxjexTFMqUnnXlIBNzqq0yf87Oz5taCIeGnqueQ0q-jkxQ557N-Obu5Y3WSBGalgxuJuoGOw3NeUGyfiTmJdDwKHVjLcgsLp6gF0rZFaQxw4ZxMOygVSaQmyG_NdspLlmd0ntBOEiVAi4A5RloFEgxNC4VsVGATyskVOF4aL0In9yYTObD4vom_TtcheZf1oUqltREJ6kasuHPzh6UOyxj3nxKd9-RFpOAxhe0xWzfvspZielN6B18Hw_gugHsal |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Two+provably+consistent+divide-and-conquer+clustering+algorithms+for+large+networks&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+-+PNAS&rft.au=Mukherjee%2C+Soumendu+Sundar&rft.au=Sarkar%2C+Purnamrita&rft.au=Bickel%2C+Peter+J&rft.date=2021-11-02&rft.eissn=1091-6490&rft.volume=118&rft.issue=44&rft_id=info:doi/10.1073%2Fpnas.2100482118&rft_id=info%3Apmid%2F34716259&rft_id=info%3Apmid%2F34716259&rft.externalDocID=34716259 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-6490&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-6490&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-6490&client=summon |