Near-optimal large-scale k-medoids clustering
The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for huge-scale datasets remains extremely challenging. The application of many approaches finding optimal or quality solutions is limited to only small an...
Gespeichert in:
| Veröffentlicht in: | Information sciences Jg. 545; S. 344 - 362 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Inc
04.02.2021
|
| Schlagworte: | |
| ISSN: | 0020-0255, 1872-6291 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for huge-scale datasets remains extremely challenging. The application of many approaches finding optimal or quality solutions is limited to only small and medium-size instances. On the other hand, many parallel, distributed algorithms that can handle huge-scale datasets usually provide very poor solutions. In this paper, we develop a first parallel, distributed primal–dual heuristic algorithm for the k-medoids problem. Its main component is a very efficient parallel subgradient column generation that solves a Lagrangian dual problem and finds a tight bound on solution quality. High-quality solutions are then produced by a parallel core selection technique. We considerably reduce computational burden and memory load by employing a nearest neighbor strategy to approximate the dissimilarity matrix. We demonstrate that our algorithm finds very close to optimal solutions, confirmed by the tightness of dual bounds, of instances that are much larger than those considered in the literature to date. Our experiments include clustering large-scale collections of face images into several thousand of clusters. We show that our approach outperforms parallel improved versions of the most popular k-medoids clustering algorithms, achieving nearly linear parallel speedup. |
|---|---|
| AbstractList | The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for huge-scale datasets remains extremely challenging. The application of many approaches finding optimal or quality solutions is limited to only small and medium-size instances. On the other hand, many parallel, distributed algorithms that can handle huge-scale datasets usually provide very poor solutions. In this paper, we develop a first parallel, distributed primal–dual heuristic algorithm for the k-medoids problem. Its main component is a very efficient parallel subgradient column generation that solves a Lagrangian dual problem and finds a tight bound on solution quality. High-quality solutions are then produced by a parallel core selection technique. We considerably reduce computational burden and memory load by employing a nearest neighbor strategy to approximate the dissimilarity matrix. We demonstrate that our algorithm finds very close to optimal solutions, confirmed by the tightness of dual bounds, of instances that are much larger than those considered in the literature to date. Our experiments include clustering large-scale collections of face images into several thousand of clusters. We show that our approach outperforms parallel improved versions of the most popular k-medoids clustering algorithms, achieving nearly linear parallel speedup. |
| Author | Vasilyev, Igor Ushakov, Anton V. |
| Author_xml | – sequence: 1 givenname: Anton V. surname: Ushakov fullname: Ushakov, Anton V. email: aushakov@icc.ru organization: Matrosov Institute for System Dynamics and Control Theory of the Siberian Branch of the Russian Academy of Sciences, 134 Lermontov Str., 664033 Irkutsk, Russia – sequence: 2 givenname: Igor surname: Vasilyev fullname: Vasilyev, Igor email: vil@icc.ru organization: Matrosov Institute for System Dynamics and Control Theory of the Siberian Branch of the Russian Academy of Sciences, 134 Lermontov Str., 664033 Irkutsk, Russia |
| BookMark | eNp90M1KAzEQwPEgFWyrD-CtL5B1JtvdbPAkxS8oetFzSGdnS-p2tyRR8O1N0ZOHXjKH4T_wy0xMhnFgIa4RCgSsb3aFH2KhQEEBTYEKz8QUG61krQxOxBTyRoKqqgsxi3EHAEtd11MhX9gFOR6S37t-0buwZRnJ9bz4kHtuR9_GBfWfMXHww_ZSnHeuj3z1N-fi_eH-bfUk16-Pz6u7tSRldJJVqaF03IHSDmtXUuOoWrqqpa4zSrMjbGusSqMNbEzjmJuSCHHDBvO7KedC_96lMMYYuLPkk0t-HFJwvrcI9qi2O5vV9qi20NisziX-Kw8h08L3yeb2t-FM-vIcbCTPA3HrA1Oy-RdO1D_yLnHh |
| CitedBy_id | crossref_primary_10_1093_comjnl_bxab206 crossref_primary_10_1016_j_jcmds_2022_100034 crossref_primary_10_1016_j_engappai_2025_111082 crossref_primary_10_1007_s11036_023_02249_w crossref_primary_10_1155_2022_3881265 crossref_primary_10_1007_s10489_025_06813_7 crossref_primary_10_1007_s00371_025_04054_w crossref_primary_10_1093_comjnl_bxaf041 crossref_primary_10_1016_j_neucom_2025_131225 crossref_primary_10_1016_j_patcog_2024_111062 crossref_primary_10_3390_math10203807 crossref_primary_10_1016_j_oceaneng_2024_119439 crossref_primary_10_1016_j_segan_2022_100658 crossref_primary_10_3390_en14133735 crossref_primary_10_1061_JPSEA2_PSENG_1439 crossref_primary_10_3390_a16090396 crossref_primary_10_1016_j_neucom_2025_129705 crossref_primary_10_1109_ACCESS_2021_3130551 crossref_primary_10_1177_14727978251361552 crossref_primary_10_1007_s11042_023_14873_5 crossref_primary_10_1016_j_eswa_2023_120278 crossref_primary_10_1007_s10462_022_10325_y crossref_primary_10_1016_j_neucom_2025_131576 crossref_primary_10_1088_1757_899X_1088_1_012034 crossref_primary_10_1002_int_22714 crossref_primary_10_1109_JSEN_2024_3392594 crossref_primary_10_1016_j_ins_2021_05_070 crossref_primary_10_3233_IDT_230123 crossref_primary_10_1016_j_asoc_2021_107924 crossref_primary_10_1364_AO_520256 crossref_primary_10_1186_s12885_024_13387_z crossref_primary_10_1109_TNSE_2024_3402383 crossref_primary_10_1016_j_ins_2022_09_011 crossref_primary_10_1134_S1990478921040128 crossref_primary_10_1109_ACCESS_2024_3469369 crossref_primary_10_1016_j_ejor_2022_11_033 crossref_primary_10_1109_TETCI_2023_3336537 crossref_primary_10_1016_j_agrformet_2025_110426 crossref_primary_10_3233_JIFS_222721 crossref_primary_10_1007_s42154_022_00205_0 crossref_primary_10_1155_2022_8566253 crossref_primary_10_3390_math12193083 crossref_primary_10_1002_sim_70109 crossref_primary_10_1016_j_energy_2022_126100 crossref_primary_10_1016_j_ins_2021_02_008 crossref_primary_10_1061_JPSEA2_PSENG_1611 crossref_primary_10_1016_j_ins_2023_01_122 crossref_primary_10_1016_j_apgeog_2024_103428 crossref_primary_10_3390_math10142519 crossref_primary_10_1016_j_cor_2023_106181 crossref_primary_10_1016_j_ins_2021_12_016 crossref_primary_10_1016_j_oceaneng_2023_114912 |
| Cites_doi | 10.1023/A:1015013919497 10.1109/CVPR.2015.7298596 10.1016/S0167-8191(03)00043-7 10.1109/LSP.2016.2603342 10.1016/0377-2217(93)90118-7 10.1016/j.ejor.2014.01.050 10.1287/ijoc.1100.0418 10.1016/j.ejor.2005.05.034 10.1145/2020408.2020515 10.1007/s10479-014-1744-x 10.1016/j.knosys.2012.10.012 10.1016/j.eswa.2017.09.052 10.1016/j.cor.2011.09.016 10.1126/science.1136800 10.1137/S0097539702416402 10.1023/B:HEUR.0000026897.40171.1a 10.1016/S0966-8349(98)00030-8 10.1016/j.eswa.2010.07.030 10.1016/j.procs.2016.05.446 10.1145/3097983.3098098 10.1137/0213014 10.1109/TIFS.2018.2796999 10.1145/2688073.2688116 10.1057/jors.1964.47 10.1145/375827.375845 10.1016/S1571-0653(04)00427-5 10.1093/bioinformatics/btv032 10.1109/TPAMI.2010.88 10.1137/0137041 10.1007/s10107-005-0700-6 10.1007/s10732-006-7284-z 10.1016/j.eswa.2008.01.039 10.1609/socs.v4i1.18282 10.1109/FG.2018.00020 10.1109/TPAMI.2017.2679100 10.1007/s10732-008-9078-y 10.1007/s10618-009-0135-4 |
| ContentType | Journal Article |
| Copyright | 2020 Elsevier Inc. |
| Copyright_xml | – notice: 2020 Elsevier Inc. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.ins.2020.08.121 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Library & Information Science |
| EISSN | 1872-6291 |
| EndPage | 362 |
| ExternalDocumentID | 10_1016_j_ins_2020_08_121 S0020025520308872 |
| GroupedDBID | --K --M --Z -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXUO AAYFN ABAOU ABBOA ABFNM ABJNI ABMAC ABUCO ABYKQ ACAZW ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE ADGUI ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIGVJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ARUGR AXJTR BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX IHE J1W JJJVA KOM LG9 LY1 M41 MHUIS MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSB SSD SST SSV SSW SSZ T5K TN5 TWZ WH7 XPP ZMT ~02 ~G- 1OL 29I 77I 9DU AAAKG AAQXK AATTM AAXKI AAYWO AAYXX ABEFU ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO ADVLN AEIPS AEUPX AFFNX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB HLZ HVGLF HZ~ H~9 R2- SBC SDS SEW UHS WUQ YYP ZY4 ~HD |
| ID | FETCH-LOGICAL-c297t-53703aef027a16a3c8ac54a5dcff927eac1d61539790b98aee83cc11be9111bb3 |
| ISICitedReferencesCount | 58 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000592406000021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0020-0255 |
| IngestDate | Tue Nov 18 22:09:45 EST 2025 Sat Nov 29 07:30:32 EST 2025 Fri Feb 23 02:48:50 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | P-median problem Parallel and distributed computing CLARA K-medoids clustering Nearest neighbors MPI |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c297t-53703aef027a16a3c8ac54a5dcff927eac1d61539790b98aee83cc11be9111bb3 |
| PageCount | 19 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_ins_2020_08_121 crossref_primary_10_1016_j_ins_2020_08_121 elsevier_sciencedirect_doi_10_1016_j_ins_2020_08_121 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-02-04 |
| PublicationDateYYYYMMDD | 2021-02-04 |
| PublicationDate_xml | – month: 02 year: 2021 text: 2021-02-04 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | Information sciences |
| PublicationYear | 2021 |
| Publisher | Elsevier Inc |
| Publisher_xml | – name: Elsevier Inc |
| References | García, Labbé, Marín (b0085) 2011; 23 Geoffrion (b0100) 1974; vol. 2 A. Arbelaez, L. Quesada. Parallelising the k-medoids clustering problem using space-partitioning, in: M. Helmert and G. Röger, editors, Proc. the 6th Annual Symposium on Combinatorial Search, SoCS 2013, pages 20–28. AAAI, 2013. Yu, Liu, Guo, Liu (b0230) 2018; 92 Megiddo, Supowit (b0155) 1984; 13 Rangel, Hendrix, Agrawal, Liao, Choudhary (b0200) 2016; 80 Lai, Fu (b0135) 2011; 38 Otto, Wang, Jain (b0180) 2018; 40 Zhu, Wang, Shan, Lv (b0250) 2014 Avella, Boccia, Sforza, Vasilyev (b0025) 2008; 15 Maze, Adams, Duncan (b0150) 2018 Zhang, Couloigner (b0245) 2005 Zhang, Zhang, Li, Qiao (b0240) 2016; 23 Norvill, Fiz Pontiveros, State, Awan, Cullen (b0175) 2017 Irkutsk supercomputer center of SB RAS. URL:http://hpc.icc.ru/en/. Accessed: 2020-05-19. Avella, Sassano, Vasilyev (b0035) 2007; 109 Redondo, Marín, Ortigosa (b0205) 2016; 246 Shi, Otto, Jain (b0215) 2018; 13 Chen, Song, Bai, Lin, Chang (b0060) 2011; 33 Frahm (b0075) 2010 Sheng, Liu (b0210) 2006; 12 F. Garcia-López, B. Melián-Batista, J.A. Moreno-Pérez, J.M. Moreno-Vega. Parallelization of the scatter search for the p-median problem. Parallel Comput., 29(5):575–589, 2003. Parallel computing in logistics. Jain, Vazirani (b0125) 2001; 48 Mladenović, Brimberg, Hansen, Moreno-Pérez (b0165) 2007; 179 Arya, Garg, Khandekar, Meyerson, Munagala, Pandit (b0015) 2004; 33 Paterlini, Nascimento, Traina (b0195) 2011; 2 Wang, Wang, Wilkes (b0225) 2020 Mu, Tong (b0170) 2019 Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman. VGGFace2: A dataset for recognising faces across pose and age, in Proc. 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, pages 67–74. IEEE, 2018. Garcia-López, Melián-Batista, Moreno-Pérez, Moreno-Vega (b0090) 2002; 8 Irawan, Salhi, Scaparra (b0120) 2014; 237 Zadegan, Mirzaie, Sadoughi (b0235) 2013; 39 Mirzasoleiman, Karbasi, Sarkar, Krause (b0160) 2013 A. Ene, S. Im, B. Moseley. Fast clustering using MapReduce, in: Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pages 681–689, New York, 2011. ACM. H.-S. Park and C.-H. Jun. A simple and fast algorithm for k-medoids clustering. Expert. Syst. Appl., 36(2, Part 2):3336–3341, 2009. Hansen, Brimberg, Urosević, Mladenović (b0110) 2009; 19 Parkhi, Vedaldi, Zisserman (b0190) 2015 Ayyala, Lin (b0045) 2015; 31 Maranzana (b0145) 1964; 15 Y. Gong, M. Pawlowski, F. Yang, L. Brandy, L. Boundev, R. Fergus. Web scale photo hash clustering on a single machine, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 19–27, June 2015. H. Song, J.-G. Lee, W.-S. Han. Pamae: Parallel k-medoids clustering with high accuracy and efficiency, in: Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, pages 1087–1096, New York, 2017. ACM. P. Awasthi, A.S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, R. Ward. Relax, no need to round: Integrality of clustering formulations, in: Proc. the 2015 Conference on Innovations in Theoretical Computer Science, ITCS ’15, pages 191–200, New York, 2015. ACM. P. Avella, A. Sassano, I. Vasil’ev. A heuristic for large-scale p-median instances. Electron. Notes in Discrete Math., 13(0):14–17, 2003. 2nd Cologne-Twente Workshop on Graphs and Combinatorial Optimization. Crainic, Gendreau, Hansen, Mladenović (b0065) 2004; 10 Kariv, Hakimi (b0130) 1979; 37 Frey, Dueck (b0080) 2007; 315 Avella, Boccia, Salerno, Vasilyev (b0020) 2012; 39 Beasley (b0050) 1993; 65 Hansen, Mladenović (b0115) 1997; 5 Manning, Raghavan, Schütze (b0140) 2008 Zhu (10.1016/j.ins.2020.08.121_b0250) 2014 Avella (10.1016/j.ins.2020.08.121_b0035) 2007; 109 Kariv (10.1016/j.ins.2020.08.121_b0130) 1979; 37 Maranzana (10.1016/j.ins.2020.08.121_b0145) 1964; 15 Crainic (10.1016/j.ins.2020.08.121_b0065) 2004; 10 Maze (10.1016/j.ins.2020.08.121_b0150) 2018 Jain (10.1016/j.ins.2020.08.121_b0125) 2001; 48 Mladenović (10.1016/j.ins.2020.08.121_b0165) 2007; 179 10.1016/j.ins.2020.08.121_b0220 Wang (10.1016/j.ins.2020.08.121_b0225) 2020 Frey (10.1016/j.ins.2020.08.121_b0080) 2007; 315 10.1016/j.ins.2020.08.121_b0185 Rangel (10.1016/j.ins.2020.08.121_b0200) 2016; 80 Frahm (10.1016/j.ins.2020.08.121_b0075) 2010 Geoffrion (10.1016/j.ins.2020.08.121_b0100) 1974; vol. 2 Arya (10.1016/j.ins.2020.08.121_b0015) 2004; 33 Avella (10.1016/j.ins.2020.08.121_b0025) 2008; 15 Chen (10.1016/j.ins.2020.08.121_b0060) 2011; 33 10.1016/j.ins.2020.08.121_b0095 10.1016/j.ins.2020.08.121_b0010 10.1016/j.ins.2020.08.121_b0055 Sheng (10.1016/j.ins.2020.08.121_b0210) 2006; 12 10.1016/j.ins.2020.08.121_b0005 Hansen (10.1016/j.ins.2020.08.121_b0115) 1997; 5 Garcia-López (10.1016/j.ins.2020.08.121_b0090) 2002; 8 Yu (10.1016/j.ins.2020.08.121_b0230) 2018; 92 Zadegan (10.1016/j.ins.2020.08.121_b0235) 2013; 39 Megiddo (10.1016/j.ins.2020.08.121_b0155) 1984; 13 10.1016/j.ins.2020.08.121_b0040 Shi (10.1016/j.ins.2020.08.121_b0215) 2018; 13 Mu (10.1016/j.ins.2020.08.121_b0170) 2019 Paterlini (10.1016/j.ins.2020.08.121_b0195) 2011; 2 Parkhi (10.1016/j.ins.2020.08.121_b0190) 2015 Zhang (10.1016/j.ins.2020.08.121_b0240) 2016; 23 Hansen (10.1016/j.ins.2020.08.121_b0110) 2009; 19 Otto (10.1016/j.ins.2020.08.121_b0180) 2018; 40 García (10.1016/j.ins.2020.08.121_b0085) 2011; 23 Norvill (10.1016/j.ins.2020.08.121_b0175) 2017 Avella (10.1016/j.ins.2020.08.121_b0020) 2012; 39 Manning (10.1016/j.ins.2020.08.121_b0140) 2008 Beasley (10.1016/j.ins.2020.08.121_b0050) 1993; 65 10.1016/j.ins.2020.08.121_b0070 10.1016/j.ins.2020.08.121_b0030 Irawan (10.1016/j.ins.2020.08.121_b0120) 2014; 237 Zhang (10.1016/j.ins.2020.08.121_b0245) 2005 Mirzasoleiman (10.1016/j.ins.2020.08.121_b0160) 2013 Redondo (10.1016/j.ins.2020.08.121_b0205) 2016; 246 10.1016/j.ins.2020.08.121_b0105 Ayyala (10.1016/j.ins.2020.08.121_b0045) 2015; 31 Lai (10.1016/j.ins.2020.08.121_b0135) 2011; 38 |
| References_xml | – volume: 39 start-page: 1625 year: 2012 end-page: 1632 ident: b0020 article-title: An aggregation heuristic for large scale p-median problem publication-title: Comput. Oper. Res. – reference: H.-S. Park and C.-H. Jun. A simple and fast algorithm for k-medoids clustering. Expert. Syst. Appl., 36(2, Part 2):3336–3341, 2009. – volume: 37 start-page: 539 year: 1979 end-page: 560 ident: b0130 article-title: An algorithmic approach to network location problems. ii: The p-medians publication-title: SIAM J. Appl. Math. – volume: 237 start-page: 590 year: 2014 end-page: 605 ident: b0120 article-title: An adaptive multiphase approach for large unconditional and conditional p-median problems publication-title: Eur. J. Oper. Res. – volume: 5 start-page: 207 year: 1997 end-page: 226 ident: b0115 article-title: Variable neighborhood search for the p-median publication-title: Locat. Sci. – volume: 40 start-page: 289 year: 2018 end-page: 303 ident: b0180 article-title: Clustering millions of faces by identity publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 13 start-page: 1626 year: 2018 end-page: 1640 ident: b0215 article-title: Face clustering: Representation and pairwise constraints publication-title: IEEE Trans. Inf. Forensics Secur. – volume: 80 start-page: 1159 year: 2016 end-page: 1169 ident: b0200 article-title: AGORAS: A fast algorithm for estimating medoids in large datasets publication-title: Procedia Comput. Sci. – reference: Irkutsk supercomputer center of SB RAS. URL:http://hpc.icc.ru/en/. Accessed: 2020-05-19. – volume: vol. 2 start-page: 82 year: 1974 end-page: 114 ident: b0100 article-title: Lagrangean relaxation for integer programming publication-title: Approaches to Integer Programming – start-page: 368 year: 2010 end-page: 381 ident: b0075 article-title: Building rome on a cloudless day publication-title: Computer Vision - ECCV 2010 – volume: 23 start-page: 546 year: 2011 end-page: 556 ident: b0085 article-title: Solving large p-median problems with a radius formulation publication-title: INFORMS J. Comput. – volume: 65 start-page: 383 year: 1993 end-page: 399 ident: b0050 article-title: Lagrangean heuristics for location problems publication-title: Eur. J. Oper. Res. – start-page: 85 year: 2020 end-page: 108 ident: b0225 article-title: Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment, chapter An Efficient K-Medoids Clustering Algorithm for Large Scale Data – year: 2019 ident: b0170 article-title: On solving large p-median problems publication-title: Environ. Plan. B - Plan. Des. – reference: H. Song, J.-G. Lee, W.-S. Han. Pamae: Parallel k-medoids clustering with high accuracy and efficiency, in: Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, pages 1087–1096, New York, 2017. ACM. – volume: 109 start-page: 89 year: 2007 end-page: 114 ident: b0035 article-title: Computational study of large-scale p-median problems publication-title: Math. Program. – start-page: 181 year: 2005 end-page: 189 ident: b0245 article-title: A new and efficient k-medoid algorithm for spatial clustering publication-title: Computational Science and Its Applications – ICCSA 2005 – start-page: 573 year: 2014 end-page: 577 ident: b0250 article-title: K-medoids clustering based on mapreduce and optimal search of medoids publication-title: 2014 9th International Conference on Computer Science Education – reference: P. Avella, A. Sassano, I. Vasil’ev. A heuristic for large-scale p-median instances. Electron. Notes in Discrete Math., 13(0):14–17, 2003. 2nd Cologne-Twente Workshop on Graphs and Combinatorial Optimization. – start-page: 158 year: 2018 end-page: 165 ident: b0150 article-title: IARPA Janus Benchmark - C: Face dataset and protocol publication-title: Proc. 2018 International Conference on Biometrics (ICB) – volume: 315 start-page: 972 year: 2007 end-page: 976 ident: b0080 article-title: Clustering by passing messages between data points publication-title: Science – volume: 23 start-page: 1499 year: 2016 end-page: 1503 ident: b0240 article-title: Joint face detection and alignment using multitask cascaded convolutional networks publication-title: IEEE Signal Process. Lett. – reference: P. Awasthi, A.S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, R. Ward. Relax, no need to round: Integrality of clustering formulations, in: Proc. the 2015 Conference on Innovations in Theoretical Computer Science, ITCS ’15, pages 191–200, New York, 2015. ACM. – volume: 39 start-page: 133 year: 2013 end-page: 143 ident: b0235 article-title: Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets publication-title: Knowl.-Based Syst. – volume: 31 start-page: 1648 year: 2015 end-page: 1654 ident: b0045 article-title: GrammR: graphical representation and modeling of count data with application in metagenomics publication-title: Bioinformatics – volume: 38 start-page: 764 year: 2011 end-page: 775 ident: b0135 article-title: Variance enhanced k-medoid clustering publication-title: Expert Syst. Appl. – volume: 15 start-page: 261 year: 1964 end-page: 270 ident: b0145 article-title: On the location of supply points to minimize transport costs publication-title: Oper. Res. Quart. – volume: 246 start-page: 253 year: 2016 end-page: 272 ident: b0205 article-title: A parallelized lagrangean relaxation approach for the discrete ordered median problem publication-title: Ann. Oper. Res. – reference: A. Arbelaez, L. Quesada. Parallelising the k-medoids clustering problem using space-partitioning, in: M. Helmert and G. Röger, editors, Proc. the 6th Annual Symposium on Combinatorial Search, SoCS 2013, pages 20–28. AAAI, 2013. – volume: 48 start-page: 274 year: 2001 end-page: 296 ident: b0125 article-title: Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation publication-title: J. ACM – reference: F. Garcia-López, B. Melián-Batista, J.A. Moreno-Pérez, J.M. Moreno-Vega. Parallelization of the scatter search for the p-median problem. Parallel Comput., 29(5):575–589, 2003. Parallel computing in logistics. – reference: Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman. VGGFace2: A dataset for recognising faces across pose and age, in Proc. 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, pages 67–74. IEEE, 2018. – start-page: 41.1 year: 2015 end-page: 41.12 ident: b0190 publication-title: Deep face recognition Proc. the British Machine Vision Conference (BMVC) – volume: 10 start-page: 293 year: 2004 end-page: 314 ident: b0065 article-title: Cooperative parallel variable neighborhood search for the p-median publication-title: J. Heuristics – volume: 33 start-page: 544 year: 2004 end-page: 562 ident: b0015 article-title: Local search heuristics for k-median and facility location problems publication-title: SIAM J. Comput. – volume: 12 start-page: 447 year: 2006 end-page: 466 ident: b0210 article-title: A genetic k-medoids clustering algorithm publication-title: J. Heuristics – start-page: 1 year: 2017 end-page: 6 ident: b0175 article-title: Automated labeling of unknown contracts in ethereum publication-title: Proc. 26th International Conference on Computer Communication and Networks – volume: 19 start-page: 351 year: 2009 end-page: 375 ident: b0110 article-title: Solving large p-median clustering problems by primal-dual variable neighborhood search publication-title: Data Min. Knowl. Discov. – start-page: 2049 year: 2013 end-page: 2057 ident: b0160 article-title: Distributed submodular maximization: Identifying representative elements in massive data publication-title: Proc. 26th International Conference on Neural Information Processing Systems, volume 2 of NIPS’13 – volume: 13 start-page: 182 year: 1984 end-page: 196 ident: b0155 article-title: On the complexity of some common geometric location problems publication-title: SIAM J. Comput. – reference: A. Ene, S. Im, B. Moseley. Fast clustering using MapReduce, in: Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pages 681–689, New York, 2011. ACM. – volume: 15 start-page: 597 year: 2008 end-page: 615 ident: b0025 article-title: An effective heuristic for large-scale capacitated facility location problems publication-title: J. Heuristics – volume: 33 start-page: 568 year: 2011 end-page: 586 ident: b0060 article-title: Parallel spectral clustering in distributed systems publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 92 start-page: 464 year: 2018 end-page: 473 ident: b0230 article-title: An improved k-medoids algorithm based on step increasing and optimizing medoids publication-title: Expert Syst. Appl. – volume: 8 start-page: 375 year: 2002 end-page: 388 ident: b0090 article-title: The parallel variable neighborhood search for the p-median problem publication-title: J. Heuristics – volume: 2 start-page: 221 year: 2011 end-page: 236 ident: b0195 article-title: Using pivots to speed-up k-medoids clustering publication-title: JIDM – reference: Y. Gong, M. Pawlowski, F. Yang, L. Brandy, L. Boundev, R. Fergus. Web scale photo hash clustering on a single machine, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 19–27, June 2015. – year: 2008 ident: b0140 article-title: Introduction to Information Retrieval – volume: 179 start-page: 927 year: 2007 end-page: 939 ident: b0165 article-title: The p-median problem: A survey of metaheuristic approaches publication-title: Eur. J. Oper. Res. – volume: 8 start-page: 375 issue: 3 year: 2002 ident: 10.1016/j.ins.2020.08.121_b0090 article-title: The parallel variable neighborhood search for the p-median problem publication-title: J. Heuristics doi: 10.1023/A:1015013919497 – ident: 10.1016/j.ins.2020.08.121_b0105 doi: 10.1109/CVPR.2015.7298596 – ident: 10.1016/j.ins.2020.08.121_b0095 doi: 10.1016/S0167-8191(03)00043-7 – start-page: 85 year: 2020 ident: 10.1016/j.ins.2020.08.121_b0225 – volume: 23 start-page: 1499 issue: 10 year: 2016 ident: 10.1016/j.ins.2020.08.121_b0240 article-title: Joint face detection and alignment using multitask cascaded convolutional networks publication-title: IEEE Signal Process. Lett. doi: 10.1109/LSP.2016.2603342 – volume: 65 start-page: 383 issue: 3 year: 1993 ident: 10.1016/j.ins.2020.08.121_b0050 article-title: Lagrangean heuristics for location problems publication-title: Eur. J. Oper. Res. doi: 10.1016/0377-2217(93)90118-7 – start-page: 158 year: 2018 ident: 10.1016/j.ins.2020.08.121_b0150 article-title: IARPA Janus Benchmark - C: Face dataset and protocol – volume: 237 start-page: 590 issue: 2 year: 2014 ident: 10.1016/j.ins.2020.08.121_b0120 article-title: An adaptive multiphase approach for large unconditional and conditional p-median problems publication-title: Eur. J. Oper. Res. doi: 10.1016/j.ejor.2014.01.050 – ident: 10.1016/j.ins.2020.08.121_b0005 – volume: 23 start-page: 546 issue: 4 year: 2011 ident: 10.1016/j.ins.2020.08.121_b0085 article-title: Solving large p-median problems with a radius formulation publication-title: INFORMS J. Comput. doi: 10.1287/ijoc.1100.0418 – volume: 179 start-page: 927 issue: 3 year: 2007 ident: 10.1016/j.ins.2020.08.121_b0165 article-title: The p-median problem: A survey of metaheuristic approaches publication-title: Eur. J. Oper. Res. doi: 10.1016/j.ejor.2005.05.034 – start-page: 1 year: 2017 ident: 10.1016/j.ins.2020.08.121_b0175 article-title: Automated labeling of unknown contracts in ethereum – ident: 10.1016/j.ins.2020.08.121_b0070 doi: 10.1145/2020408.2020515 – volume: 246 start-page: 253 issue: 1 year: 2016 ident: 10.1016/j.ins.2020.08.121_b0205 article-title: A parallelized lagrangean relaxation approach for the discrete ordered median problem publication-title: Ann. Oper. Res. doi: 10.1007/s10479-014-1744-x – volume: 39 start-page: 133 year: 2013 ident: 10.1016/j.ins.2020.08.121_b0235 article-title: Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets publication-title: Knowl.-Based Syst. doi: 10.1016/j.knosys.2012.10.012 – volume: 92 start-page: 464 year: 2018 ident: 10.1016/j.ins.2020.08.121_b0230 article-title: An improved k-medoids algorithm based on step increasing and optimizing medoids publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2017.09.052 – volume: 39 start-page: 1625 issue: 7 year: 2012 ident: 10.1016/j.ins.2020.08.121_b0020 article-title: An aggregation heuristic for large scale p-median problem publication-title: Comput. Oper. Res. doi: 10.1016/j.cor.2011.09.016 – volume: 315 start-page: 972 issue: 5814 year: 2007 ident: 10.1016/j.ins.2020.08.121_b0080 article-title: Clustering by passing messages between data points publication-title: Science doi: 10.1126/science.1136800 – start-page: 181 year: 2005 ident: 10.1016/j.ins.2020.08.121_b0245 article-title: A new and efficient k-medoid algorithm for spatial clustering – volume: vol. 2 start-page: 82 year: 1974 ident: 10.1016/j.ins.2020.08.121_b0100 article-title: Lagrangean relaxation for integer programming – start-page: 41.1 year: 2015 ident: 10.1016/j.ins.2020.08.121_b0190 – start-page: 2049 year: 2013 ident: 10.1016/j.ins.2020.08.121_b0160 article-title: Distributed submodular maximization: Identifying representative elements in massive data – volume: 33 start-page: 544 issue: 3 year: 2004 ident: 10.1016/j.ins.2020.08.121_b0015 article-title: Local search heuristics for k-median and facility location problems publication-title: SIAM J. Comput. doi: 10.1137/S0097539702416402 – volume: 10 start-page: 293 issue: 3 year: 2004 ident: 10.1016/j.ins.2020.08.121_b0065 article-title: Cooperative parallel variable neighborhood search for the p-median publication-title: J. Heuristics doi: 10.1023/B:HEUR.0000026897.40171.1a – volume: 5 start-page: 207 issue: 4 year: 1997 ident: 10.1016/j.ins.2020.08.121_b0115 article-title: Variable neighborhood search for the p-median publication-title: Locat. Sci. doi: 10.1016/S0966-8349(98)00030-8 – volume: 38 start-page: 764 issue: 1 year: 2011 ident: 10.1016/j.ins.2020.08.121_b0135 article-title: Variance enhanced k-medoid clustering publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2010.07.030 – volume: 80 start-page: 1159 year: 2016 ident: 10.1016/j.ins.2020.08.121_b0200 article-title: AGORAS: A fast algorithm for estimating medoids in large datasets publication-title: Procedia Comput. Sci. doi: 10.1016/j.procs.2016.05.446 – ident: 10.1016/j.ins.2020.08.121_b0220 doi: 10.1145/3097983.3098098 – year: 2019 ident: 10.1016/j.ins.2020.08.121_b0170 article-title: On solving large p-median problems publication-title: Environ. Plan. B - Plan. Des. – volume: 2 start-page: 221 issue: 2 year: 2011 ident: 10.1016/j.ins.2020.08.121_b0195 article-title: Using pivots to speed-up k-medoids clustering publication-title: JIDM – volume: 13 start-page: 182 issue: 1 year: 1984 ident: 10.1016/j.ins.2020.08.121_b0155 article-title: On the complexity of some common geometric location problems publication-title: SIAM J. Comput. doi: 10.1137/0213014 – volume: 13 start-page: 1626 issue: 7 year: 2018 ident: 10.1016/j.ins.2020.08.121_b0215 article-title: Face clustering: Representation and pairwise constraints publication-title: IEEE Trans. Inf. Forensics Secur. doi: 10.1109/TIFS.2018.2796999 – start-page: 368 year: 2010 ident: 10.1016/j.ins.2020.08.121_b0075 article-title: Building rome on a cloudless day – ident: 10.1016/j.ins.2020.08.121_b0040 doi: 10.1145/2688073.2688116 – volume: 15 start-page: 261 issue: 3 year: 1964 ident: 10.1016/j.ins.2020.08.121_b0145 article-title: On the location of supply points to minimize transport costs publication-title: Oper. Res. Quart. doi: 10.1057/jors.1964.47 – start-page: 573 year: 2014 ident: 10.1016/j.ins.2020.08.121_b0250 article-title: K-medoids clustering based on mapreduce and optimal search of medoids – volume: 48 start-page: 274 issue: 2 year: 2001 ident: 10.1016/j.ins.2020.08.121_b0125 article-title: Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation publication-title: J. ACM doi: 10.1145/375827.375845 – ident: 10.1016/j.ins.2020.08.121_b0030 doi: 10.1016/S1571-0653(04)00427-5 – volume: 31 start-page: 1648 issue: 10 year: 2015 ident: 10.1016/j.ins.2020.08.121_b0045 article-title: GrammR: graphical representation and modeling of count data with application in metagenomics publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv032 – volume: 33 start-page: 568 issue: 3 year: 2011 ident: 10.1016/j.ins.2020.08.121_b0060 article-title: Parallel spectral clustering in distributed systems publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2010.88 – volume: 37 start-page: 539 issue: 3 year: 1979 ident: 10.1016/j.ins.2020.08.121_b0130 article-title: An algorithmic approach to network location problems. ii: The p-medians publication-title: SIAM J. Appl. Math. doi: 10.1137/0137041 – volume: 109 start-page: 89 issue: 1 year: 2007 ident: 10.1016/j.ins.2020.08.121_b0035 article-title: Computational study of large-scale p-median problems publication-title: Math. Program. doi: 10.1007/s10107-005-0700-6 – volume: 12 start-page: 447 issue: 6 year: 2006 ident: 10.1016/j.ins.2020.08.121_b0210 article-title: A genetic k-medoids clustering algorithm publication-title: J. Heuristics doi: 10.1007/s10732-006-7284-z – ident: 10.1016/j.ins.2020.08.121_b0185 doi: 10.1016/j.eswa.2008.01.039 – ident: 10.1016/j.ins.2020.08.121_b0010 doi: 10.1609/socs.v4i1.18282 – ident: 10.1016/j.ins.2020.08.121_b0055 doi: 10.1109/FG.2018.00020 – volume: 40 start-page: 289 issue: 2 year: 2018 ident: 10.1016/j.ins.2020.08.121_b0180 article-title: Clustering millions of faces by identity publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2017.2679100 – year: 2008 ident: 10.1016/j.ins.2020.08.121_b0140 – volume: 15 start-page: 597 issue: 6 year: 2008 ident: 10.1016/j.ins.2020.08.121_b0025 article-title: An effective heuristic for large-scale capacitated facility location problems publication-title: J. Heuristics doi: 10.1007/s10732-008-9078-y – volume: 19 start-page: 351 issue: 3 year: 2009 ident: 10.1016/j.ins.2020.08.121_b0110 article-title: Solving large p-median clustering problems by primal-dual variable neighborhood search publication-title: Data Min. Knowl. Discov. doi: 10.1007/s10618-009-0135-4 |
| SSID | ssj0004766 |
| Score | 2.5496771 |
| Snippet | The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 344 |
| SubjectTerms | CLARA K-medoids clustering MPI Nearest neighbors P-median problem Parallel and distributed computing |
| Title | Near-optimal large-scale k-medoids clustering |
| URI | https://dx.doi.org/10.1016/j.ins.2020.08.121 |
| Volume | 545 |
| WOSCitedRecordID | wos000592406000021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-6291 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004766 issn: 0020-0255 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NT9swFLcYcNgOiI8hGGPKAe2wylPs1HF8RBNoTKjiAFVvkeM6W0tJUFMQ_Pc8x3YaYCBA4hJVSdy07-e8D_u930NoL5SZ4YmSWEimcTdT8M4pCFaY4izOY3DQ69qq_jHv9ZLBQJy4znRV3U6AF0VycyMu3xVqOAdgm9LZV8DdfCmcgM8AOhwBdji-CPge_CFcgia4AOlPTKI3rgAI3TnHYPnK0bDqqMmV4UfwVmvss9mbSsaOM4yNw31W_ZPn5bWjG4Ab-j_9pb6sRpNbXV87-ltO2-sIlNSpx_N1xKbA5V7-pfEmsQk7rLmwOjLhFMfUNtnySpR1WUsNRpbT0VnUyOrbR8rarhuMIcIwvOk0NFyqxNZLP-DANlvKdfRDDb0OPP8DWqKcCVBjS_tHB4M_81JYbren_e_2G9l1St-DB_3fFWm5F6eraMXFBcG-xXMNLehiHX1qsUWuo11XYxJ8D1pQBU47byDcRj5oIR80yAdz5D-js8OD01-_seuGgRUVfIZZBMpZ6jykXJJYRiqRCl4yNlR5LigHA0qGxnsXXISZSKTWSaQUIZk29izLok20WJSF3kIBuH3S-NEiT3g3FyTjjOdxFA-JBJVM8m0UesGkylHFm44lk9TnBI5TkGVqZJmGSQqy3EY_miGXlifluZu7Xtqpm8_WgUtBFk8P-_K2YTvo43y-f0WLs-mV3kXL6no2qqbf3AS6A_Oec0A |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Near-optimal+large-scale+k-medoids+clustering&rft.jtitle=Information+sciences&rft.au=Ushakov%2C+Anton+V.&rft.au=Vasilyev%2C+Igor&rft.date=2021-02-04&rft.pub=Elsevier+Inc&rft.issn=0020-0255&rft.eissn=1872-6291&rft.volume=545&rft.spage=344&rft.epage=362&rft_id=info:doi/10.1016%2Fj.ins.2020.08.121&rft.externalDocID=S0020025520308872 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0255&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0255&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0255&client=summon |