Two provably consistent divide-and-conquer clustering algorithms for large networks

In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the National Academy of Sciences - PNAS Vol. 118; no. 44
Main Authors: Mukherjee, Soumendu Sundar, Sarkar, Purnamrita, Bickel, Peter J
Format: Journal Article
Language:English
Published: United States 02.11.2021
Subjects:
ISSN:1091-6490, 1091-6490
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.
AbstractList In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.
In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform clustering on several small subgraphs and finally patch the results into a single clustering. The main advantage of these algorithms is that they significantly bring down the computational cost of traditional algorithms, including spectral clustering, semidefinite programs, modularity-based methods, likelihood-based methods, etc., without losing accuracy, and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Since most traditional algorithms are accurate, and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove the consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.
Author Sarkar, Purnamrita
Bickel, Peter J
Mukherjee, Soumendu Sundar
Author_xml – sequence: 1
  givenname: Soumendu Sundar
  surname: Mukherjee
  fullname: Mukherjee, Soumendu Sundar
  email: soumendu041@gmail.com, bickel@stat.berkeley.edu
  organization: Department of Statistics, University of California, Berkeley, CA 94720
– sequence: 2
  givenname: Purnamrita
  surname: Sarkar
  fullname: Sarkar, Purnamrita
  organization: Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX 78705
– sequence: 3
  givenname: Peter J
  orcidid: 0000-0001-7480-662X
  surname: Bickel
  fullname: Bickel, Peter J
  email: soumendu041@gmail.com, bickel@stat.berkeley.edu
  organization: Department of Statistics, University of California, Berkeley, CA 94720; soumendu041@gmail.com bickel@stat.berkeley.edu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34716259$$D View this record in MEDLINE/PubMed
BookMark eNpNkEtPwzAQhC1URB9w5oZ85JJiO05SH1HFS6rEgXKOHHsdDIld7KRV_z2WKBKnHe18Ws3sHE2cd4DQNSVLSqr8budkXDJKCF8xSldnaEaJoFnJBZn801M0j_GTECKKFblA05xXtGSFmKG37cHjXfB72XRHrLyLNg7gBqzt3mrIpNNZ2n6PELDqxuQF61osu9YHO3z0ERsfcCdDC9jBcPDhK16icyO7CFenuUDvjw_b9XO2eX16Wd9vMsUrMWRNQWUjgBEqiDJ5QzSYUlCTiikGogKtSyEKUFxK0VBjQDNRGcoryQUvGrZAt793U_4UMA51b6OCrpMO_BjrVJDQvOSMJfTmhI5ND7reBdvLcKz_HsF-AJv5ZAY
CitedBy_id crossref_primary_10_1007_s11222_025_10620_y
crossref_primary_10_1080_10618600_2025_2509588
crossref_primary_10_1016_j_csda_2023_107835
crossref_primary_10_1080_10618600_2024_2432974
crossref_primary_10_1145_3657300
crossref_primary_10_1002_sta4_475
ContentType Journal Article
DBID NPM
7X8
DOI 10.1073/pnas.2100482118
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Sciences (General)
EISSN 1091-6490
ExternalDocumentID 34716259
Genre Research Support, U.S. Gov't, Non-P.H.S
Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID ---
-DZ
-~X
.55
0R~
123
29P
2AX
2FS
2WC
4.4
53G
5RE
5VS
85S
AACGO
AAFWJ
AANCE
ABBHK
ABOCM
ABPLY
ABPPZ
ABTLG
ABXSQ
ABZEH
ACGOD
ACIWK
ACNCT
ACPRK
AENEX
AEUPB
AEXZC
AFFNX
AFOSN
AFRAH
ALMA_UNASSIGNED_HOLDINGS
BKOMP
CS3
D0L
DCCCD
DIK
DU5
E3Z
EBS
F5P
FRP
GX1
H13
HH5
HYE
IPSME
JAAYA
JBMMH
JENOY
JHFFW
JKQEH
JLS
JLXEF
JPM
JSG
JST
KQ8
L7B
LU7
N9A
NPM
N~3
O9-
OK1
PNE
PQQKQ
R.V
RHI
RNA
RNS
RPM
RXW
SA0
SJN
TAE
TN5
UKR
W8F
WH7
WOQ
WOW
X7M
XSW
Y6R
YBH
YKV
YSK
ZCA
~02
~KM
7X8
ID FETCH-LOGICAL-c479t-b51ab9e20190cf3b0def691f107c2e97edd6995ec4aa9b1ffed297f147a4945b2
IEDL.DBID 7X8
ISICitedReferencesCount 10
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000720888300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1091-6490
IngestDate Wed Oct 01 14:47:30 EDT 2025
Thu Apr 03 07:07:36 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 44
Keywords networks
clustering
divide-and-conquer
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c479t-b51ab9e20190cf3b0def691f107c2e97edd6995ec4aa9b1ffed297f147a4945b2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-7480-662X
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/8612351
PMID 34716259
PQID 2590136422
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2590136422
pubmed_primary_34716259
PublicationCentury 2000
PublicationDate 2021-11-02
PublicationDateYYYYMMDD 2021-11-02
PublicationDate_xml – month: 11
  year: 2021
  text: 2021-11-02
  day: 02
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Proceedings of the National Academy of Sciences - PNAS
PublicationTitleAlternate Proc Natl Acad Sci U S A
PublicationYear 2021
SSID ssj0009580
Score 2.4901414
Snippet In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms that perform...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
Title Two provably consistent divide-and-conquer clustering algorithms for large networks
URI https://www.ncbi.nlm.nih.gov/pubmed/34716259
https://www.proquest.com/docview/2590136422
Volume 118
WOSCitedRecordID wos000720888300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3PS8MwFA7qPHhR58_5iwge9BC3JtnSdxIRhxfHwAm7lTRNVKjtXDfF_96XtkMvguClh0Lb8PJe8vXle98j5Axi56TQmiFY1kzagDNtuGFCG9wdLAYUT8pmE2owCMdjGNYJt6KmVS7WxHKhTnLjc-Rt7oskBaJlfjV5Y75rlD9drVtoLJOGQCjjKV1qHP4Q3Q0rNQIIWE9CZyHto0R7kuniknu5tBB_gcLf8WW5z_Q3_jvCTbJeI0x6XblEkyzZbIs06xgu6HktNH2xTR5GHzn1OQUdp5_UeKosznk2o2WNlmU6SxjexTFMqUnnXlIBNzqq0yf87Oz5taCIeGnqueQ0q-jkxQ557N-Obu5Y3WSBGalgxuJuoGOw3NeUGyfiTmJdDwKHVjLcgsLp6gF0rZFaQxw4ZxMOygVSaQmyG_NdspLlmd0ntBOEiVAi4A5RloFEgxNC4VsVGATyskVOF4aL0In9yYTObD4vom_TtcheZf1oUqltREJ6kasuHPzh6UOyxj3nxKd9-RFpOAxhe0xWzfvspZielN6B18Hw_gugHsal
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Two+provably+consistent+divide-and-conquer+clustering+algorithms+for+large+networks&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+-+PNAS&rft.au=Mukherjee%2C+Soumendu+Sundar&rft.au=Sarkar%2C+Purnamrita&rft.au=Bickel%2C+Peter+J&rft.date=2021-11-02&rft.eissn=1091-6490&rft.volume=118&rft.issue=44&rft_id=info:doi/10.1073%2Fpnas.2100482118&rft_id=info%3Apmid%2F34716259&rft_id=info%3Apmid%2F34716259&rft.externalDocID=34716259
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-6490&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-6490&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-6490&client=summon