Near-optimal large-scale k-medoids clustering

The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for huge-scale datasets remains extremely challenging. The application of many approaches finding optimal or quality solutions is limited to only small an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences Jg. 545; S. 344 - 362
Hauptverfasser: Ushakov, Anton V., Vasilyev, Igor
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Inc 04.02.2021
Schlagworte:
ISSN:0020-0255, 1872-6291
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for huge-scale datasets remains extremely challenging. The application of many approaches finding optimal or quality solutions is limited to only small and medium-size instances. On the other hand, many parallel, distributed algorithms that can handle huge-scale datasets usually provide very poor solutions. In this paper, we develop a first parallel, distributed primal–dual heuristic algorithm for the k-medoids problem. Its main component is a very efficient parallel subgradient column generation that solves a Lagrangian dual problem and finds a tight bound on solution quality. High-quality solutions are then produced by a parallel core selection technique. We considerably reduce computational burden and memory load by employing a nearest neighbor strategy to approximate the dissimilarity matrix. We demonstrate that our algorithm finds very close to optimal solutions, confirmed by the tightness of dual bounds, of instances that are much larger than those considered in the literature to date. Our experiments include clustering large-scale collections of face images into several thousand of clusters. We show that our approach outperforms parallel improved versions of the most popular k-medoids clustering algorithms, achieving nearly linear parallel speedup.
AbstractList The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for huge-scale datasets remains extremely challenging. The application of many approaches finding optimal or quality solutions is limited to only small and medium-size instances. On the other hand, many parallel, distributed algorithms that can handle huge-scale datasets usually provide very poor solutions. In this paper, we develop a first parallel, distributed primal–dual heuristic algorithm for the k-medoids problem. Its main component is a very efficient parallel subgradient column generation that solves a Lagrangian dual problem and finds a tight bound on solution quality. High-quality solutions are then produced by a parallel core selection technique. We considerably reduce computational burden and memory load by employing a nearest neighbor strategy to approximate the dissimilarity matrix. We demonstrate that our algorithm finds very close to optimal solutions, confirmed by the tightness of dual bounds, of instances that are much larger than those considered in the literature to date. Our experiments include clustering large-scale collections of face images into several thousand of clusters. We show that our approach outperforms parallel improved versions of the most popular k-medoids clustering algorithms, achieving nearly linear parallel speedup.
Author Vasilyev, Igor
Ushakov, Anton V.
Author_xml – sequence: 1
  givenname: Anton V.
  surname: Ushakov
  fullname: Ushakov, Anton V.
  email: aushakov@icc.ru
  organization: Matrosov Institute for System Dynamics and Control Theory of the Siberian Branch of the Russian Academy of Sciences, 134 Lermontov Str., 664033 Irkutsk, Russia
– sequence: 2
  givenname: Igor
  surname: Vasilyev
  fullname: Vasilyev, Igor
  email: vil@icc.ru
  organization: Matrosov Institute for System Dynamics and Control Theory of the Siberian Branch of the Russian Academy of Sciences, 134 Lermontov Str., 664033 Irkutsk, Russia
BookMark eNp90M1KAzEQwPEgFWyrD-CtL5B1JtvdbPAkxS8oetFzSGdnS-p2tyRR8O1N0ZOHXjKH4T_wy0xMhnFgIa4RCgSsb3aFH2KhQEEBTYEKz8QUG61krQxOxBTyRoKqqgsxi3EHAEtd11MhX9gFOR6S37t-0buwZRnJ9bz4kHtuR9_GBfWfMXHww_ZSnHeuj3z1N-fi_eH-bfUk16-Pz6u7tSRldJJVqaF03IHSDmtXUuOoWrqqpa4zSrMjbGusSqMNbEzjmJuSCHHDBvO7KedC_96lMMYYuLPkk0t-HFJwvrcI9qi2O5vV9qi20NisziX-Kw8h08L3yeb2t-FM-vIcbCTPA3HrA1Oy-RdO1D_yLnHh
CitedBy_id crossref_primary_10_1093_comjnl_bxab206
crossref_primary_10_1016_j_jcmds_2022_100034
crossref_primary_10_1016_j_engappai_2025_111082
crossref_primary_10_1007_s11036_023_02249_w
crossref_primary_10_1155_2022_3881265
crossref_primary_10_1007_s10489_025_06813_7
crossref_primary_10_1007_s00371_025_04054_w
crossref_primary_10_1093_comjnl_bxaf041
crossref_primary_10_1016_j_neucom_2025_131225
crossref_primary_10_1016_j_patcog_2024_111062
crossref_primary_10_3390_math10203807
crossref_primary_10_1016_j_oceaneng_2024_119439
crossref_primary_10_1016_j_segan_2022_100658
crossref_primary_10_3390_en14133735
crossref_primary_10_1061_JPSEA2_PSENG_1439
crossref_primary_10_3390_a16090396
crossref_primary_10_1016_j_neucom_2025_129705
crossref_primary_10_1109_ACCESS_2021_3130551
crossref_primary_10_1177_14727978251361552
crossref_primary_10_1007_s11042_023_14873_5
crossref_primary_10_1016_j_eswa_2023_120278
crossref_primary_10_1007_s10462_022_10325_y
crossref_primary_10_1016_j_neucom_2025_131576
crossref_primary_10_1088_1757_899X_1088_1_012034
crossref_primary_10_1002_int_22714
crossref_primary_10_1109_JSEN_2024_3392594
crossref_primary_10_1016_j_ins_2021_05_070
crossref_primary_10_3233_IDT_230123
crossref_primary_10_1016_j_asoc_2021_107924
crossref_primary_10_1364_AO_520256
crossref_primary_10_1186_s12885_024_13387_z
crossref_primary_10_1109_TNSE_2024_3402383
crossref_primary_10_1016_j_ins_2022_09_011
crossref_primary_10_1134_S1990478921040128
crossref_primary_10_1109_ACCESS_2024_3469369
crossref_primary_10_1016_j_ejor_2022_11_033
crossref_primary_10_1109_TETCI_2023_3336537
crossref_primary_10_1016_j_agrformet_2025_110426
crossref_primary_10_3233_JIFS_222721
crossref_primary_10_1007_s42154_022_00205_0
crossref_primary_10_1155_2022_8566253
crossref_primary_10_3390_math12193083
crossref_primary_10_1002_sim_70109
crossref_primary_10_1016_j_energy_2022_126100
crossref_primary_10_1016_j_ins_2021_02_008
crossref_primary_10_1061_JPSEA2_PSENG_1611
crossref_primary_10_1016_j_ins_2023_01_122
crossref_primary_10_1016_j_apgeog_2024_103428
crossref_primary_10_3390_math10142519
crossref_primary_10_1016_j_cor_2023_106181
crossref_primary_10_1016_j_ins_2021_12_016
crossref_primary_10_1016_j_oceaneng_2023_114912
Cites_doi 10.1023/A:1015013919497
10.1109/CVPR.2015.7298596
10.1016/S0167-8191(03)00043-7
10.1109/LSP.2016.2603342
10.1016/0377-2217(93)90118-7
10.1016/j.ejor.2014.01.050
10.1287/ijoc.1100.0418
10.1016/j.ejor.2005.05.034
10.1145/2020408.2020515
10.1007/s10479-014-1744-x
10.1016/j.knosys.2012.10.012
10.1016/j.eswa.2017.09.052
10.1016/j.cor.2011.09.016
10.1126/science.1136800
10.1137/S0097539702416402
10.1023/B:HEUR.0000026897.40171.1a
10.1016/S0966-8349(98)00030-8
10.1016/j.eswa.2010.07.030
10.1016/j.procs.2016.05.446
10.1145/3097983.3098098
10.1137/0213014
10.1109/TIFS.2018.2796999
10.1145/2688073.2688116
10.1057/jors.1964.47
10.1145/375827.375845
10.1016/S1571-0653(04)00427-5
10.1093/bioinformatics/btv032
10.1109/TPAMI.2010.88
10.1137/0137041
10.1007/s10107-005-0700-6
10.1007/s10732-006-7284-z
10.1016/j.eswa.2008.01.039
10.1609/socs.v4i1.18282
10.1109/FG.2018.00020
10.1109/TPAMI.2017.2679100
10.1007/s10732-008-9078-y
10.1007/s10618-009-0135-4
ContentType Journal Article
Copyright 2020 Elsevier Inc.
Copyright_xml – notice: 2020 Elsevier Inc.
DBID AAYXX
CITATION
DOI 10.1016/j.ins.2020.08.121
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Library & Information Science
EISSN 1872-6291
EndPage 362
ExternalDocumentID 10_1016_j_ins_2020_08_121
S0020025520308872
GroupedDBID --K
--M
--Z
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABAOU
ABBOA
ABFNM
ABJNI
ABMAC
ABUCO
ABYKQ
ACAZW
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGVJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ARUGR
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
M41
MHUIS
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSB
SSD
SST
SSV
SSW
SSZ
T5K
TN5
TWZ
WH7
XPP
ZMT
~02
~G-
1OL
29I
77I
9DU
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABEFU
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
ADVLN
AEIPS
AEUPX
AFFNX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
H~9
R2-
SBC
SDS
SEW
UHS
WUQ
YYP
ZY4
~HD
ID FETCH-LOGICAL-c297t-53703aef027a16a3c8ac54a5dcff927eac1d61539790b98aee83cc11be9111bb3
ISICitedReferencesCount 58
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000592406000021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0020-0255
IngestDate Tue Nov 18 22:09:45 EST 2025
Sat Nov 29 07:30:32 EST 2025
Fri Feb 23 02:48:50 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords P-median problem
Parallel and distributed computing
CLARA
K-medoids clustering
Nearest neighbors
MPI
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c297t-53703aef027a16a3c8ac54a5dcff927eac1d61539790b98aee83cc11be9111bb3
PageCount 19
ParticipantIDs crossref_citationtrail_10_1016_j_ins_2020_08_121
crossref_primary_10_1016_j_ins_2020_08_121
elsevier_sciencedirect_doi_10_1016_j_ins_2020_08_121
PublicationCentury 2000
PublicationDate 2021-02-04
PublicationDateYYYYMMDD 2021-02-04
PublicationDate_xml – month: 02
  year: 2021
  text: 2021-02-04
  day: 04
PublicationDecade 2020
PublicationTitle Information sciences
PublicationYear 2021
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References García, Labbé, Marín (b0085) 2011; 23
Geoffrion (b0100) 1974; vol. 2
A. Arbelaez, L. Quesada. Parallelising the k-medoids clustering problem using space-partitioning, in: M. Helmert and G. Röger, editors, Proc. the 6th Annual Symposium on Combinatorial Search, SoCS 2013, pages 20–28. AAAI, 2013.
Yu, Liu, Guo, Liu (b0230) 2018; 92
Megiddo, Supowit (b0155) 1984; 13
Rangel, Hendrix, Agrawal, Liao, Choudhary (b0200) 2016; 80
Lai, Fu (b0135) 2011; 38
Otto, Wang, Jain (b0180) 2018; 40
Zhu, Wang, Shan, Lv (b0250) 2014
Avella, Boccia, Sforza, Vasilyev (b0025) 2008; 15
Maze, Adams, Duncan (b0150) 2018
Zhang, Couloigner (b0245) 2005
Zhang, Zhang, Li, Qiao (b0240) 2016; 23
Norvill, Fiz Pontiveros, State, Awan, Cullen (b0175) 2017
Irkutsk supercomputer center of SB RAS. URL:http://hpc.icc.ru/en/. Accessed: 2020-05-19.
Avella, Sassano, Vasilyev (b0035) 2007; 109
Redondo, Marín, Ortigosa (b0205) 2016; 246
Shi, Otto, Jain (b0215) 2018; 13
Chen, Song, Bai, Lin, Chang (b0060) 2011; 33
Frahm (b0075) 2010
Sheng, Liu (b0210) 2006; 12
F. Garcia-López, B. Melián-Batista, J.A. Moreno-Pérez, J.M. Moreno-Vega. Parallelization of the scatter search for the p-median problem. Parallel Comput., 29(5):575–589, 2003. Parallel computing in logistics.
Jain, Vazirani (b0125) 2001; 48
Mladenović, Brimberg, Hansen, Moreno-Pérez (b0165) 2007; 179
Arya, Garg, Khandekar, Meyerson, Munagala, Pandit (b0015) 2004; 33
Paterlini, Nascimento, Traina (b0195) 2011; 2
Wang, Wang, Wilkes (b0225) 2020
Mu, Tong (b0170) 2019
Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman. VGGFace2: A dataset for recognising faces across pose and age, in Proc. 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, pages 67–74. IEEE, 2018.
Garcia-López, Melián-Batista, Moreno-Pérez, Moreno-Vega (b0090) 2002; 8
Irawan, Salhi, Scaparra (b0120) 2014; 237
Zadegan, Mirzaie, Sadoughi (b0235) 2013; 39
Mirzasoleiman, Karbasi, Sarkar, Krause (b0160) 2013
A. Ene, S. Im, B. Moseley. Fast clustering using MapReduce, in: Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pages 681–689, New York, 2011. ACM.
H.-S. Park and C.-H. Jun. A simple and fast algorithm for k-medoids clustering. Expert. Syst. Appl., 36(2, Part 2):3336–3341, 2009.
Hansen, Brimberg, Urosević, Mladenović (b0110) 2009; 19
Parkhi, Vedaldi, Zisserman (b0190) 2015
Ayyala, Lin (b0045) 2015; 31
Maranzana (b0145) 1964; 15
Y. Gong, M. Pawlowski, F. Yang, L. Brandy, L. Boundev, R. Fergus. Web scale photo hash clustering on a single machine, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 19–27, June 2015.
H. Song, J.-G. Lee, W.-S. Han. Pamae: Parallel k-medoids clustering with high accuracy and efficiency, in: Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, pages 1087–1096, New York, 2017. ACM.
P. Awasthi, A.S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, R. Ward. Relax, no need to round: Integrality of clustering formulations, in: Proc. the 2015 Conference on Innovations in Theoretical Computer Science, ITCS ’15, pages 191–200, New York, 2015. ACM.
P. Avella, A. Sassano, I. Vasil’ev. A heuristic for large-scale p-median instances. Electron. Notes in Discrete Math., 13(0):14–17, 2003. 2nd Cologne-Twente Workshop on Graphs and Combinatorial Optimization.
Crainic, Gendreau, Hansen, Mladenović (b0065) 2004; 10
Kariv, Hakimi (b0130) 1979; 37
Frey, Dueck (b0080) 2007; 315
Avella, Boccia, Salerno, Vasilyev (b0020) 2012; 39
Beasley (b0050) 1993; 65
Hansen, Mladenović (b0115) 1997; 5
Manning, Raghavan, Schütze (b0140) 2008
Zhu (10.1016/j.ins.2020.08.121_b0250) 2014
Avella (10.1016/j.ins.2020.08.121_b0035) 2007; 109
Kariv (10.1016/j.ins.2020.08.121_b0130) 1979; 37
Maranzana (10.1016/j.ins.2020.08.121_b0145) 1964; 15
Crainic (10.1016/j.ins.2020.08.121_b0065) 2004; 10
Maze (10.1016/j.ins.2020.08.121_b0150) 2018
Jain (10.1016/j.ins.2020.08.121_b0125) 2001; 48
Mladenović (10.1016/j.ins.2020.08.121_b0165) 2007; 179
10.1016/j.ins.2020.08.121_b0220
Wang (10.1016/j.ins.2020.08.121_b0225) 2020
Frey (10.1016/j.ins.2020.08.121_b0080) 2007; 315
10.1016/j.ins.2020.08.121_b0185
Rangel (10.1016/j.ins.2020.08.121_b0200) 2016; 80
Frahm (10.1016/j.ins.2020.08.121_b0075) 2010
Geoffrion (10.1016/j.ins.2020.08.121_b0100) 1974; vol. 2
Arya (10.1016/j.ins.2020.08.121_b0015) 2004; 33
Avella (10.1016/j.ins.2020.08.121_b0025) 2008; 15
Chen (10.1016/j.ins.2020.08.121_b0060) 2011; 33
10.1016/j.ins.2020.08.121_b0095
10.1016/j.ins.2020.08.121_b0010
10.1016/j.ins.2020.08.121_b0055
Sheng (10.1016/j.ins.2020.08.121_b0210) 2006; 12
10.1016/j.ins.2020.08.121_b0005
Hansen (10.1016/j.ins.2020.08.121_b0115) 1997; 5
Garcia-López (10.1016/j.ins.2020.08.121_b0090) 2002; 8
Yu (10.1016/j.ins.2020.08.121_b0230) 2018; 92
Zadegan (10.1016/j.ins.2020.08.121_b0235) 2013; 39
Megiddo (10.1016/j.ins.2020.08.121_b0155) 1984; 13
10.1016/j.ins.2020.08.121_b0040
Shi (10.1016/j.ins.2020.08.121_b0215) 2018; 13
Mu (10.1016/j.ins.2020.08.121_b0170) 2019
Paterlini (10.1016/j.ins.2020.08.121_b0195) 2011; 2
Parkhi (10.1016/j.ins.2020.08.121_b0190) 2015
Zhang (10.1016/j.ins.2020.08.121_b0240) 2016; 23
Hansen (10.1016/j.ins.2020.08.121_b0110) 2009; 19
Otto (10.1016/j.ins.2020.08.121_b0180) 2018; 40
García (10.1016/j.ins.2020.08.121_b0085) 2011; 23
Norvill (10.1016/j.ins.2020.08.121_b0175) 2017
Avella (10.1016/j.ins.2020.08.121_b0020) 2012; 39
Manning (10.1016/j.ins.2020.08.121_b0140) 2008
Beasley (10.1016/j.ins.2020.08.121_b0050) 1993; 65
10.1016/j.ins.2020.08.121_b0070
10.1016/j.ins.2020.08.121_b0030
Irawan (10.1016/j.ins.2020.08.121_b0120) 2014; 237
Zhang (10.1016/j.ins.2020.08.121_b0245) 2005
Mirzasoleiman (10.1016/j.ins.2020.08.121_b0160) 2013
Redondo (10.1016/j.ins.2020.08.121_b0205) 2016; 246
10.1016/j.ins.2020.08.121_b0105
Ayyala (10.1016/j.ins.2020.08.121_b0045) 2015; 31
Lai (10.1016/j.ins.2020.08.121_b0135) 2011; 38
References_xml – volume: 39
  start-page: 1625
  year: 2012
  end-page: 1632
  ident: b0020
  article-title: An aggregation heuristic for large scale p-median problem
  publication-title: Comput. Oper. Res.
– reference: H.-S. Park and C.-H. Jun. A simple and fast algorithm for k-medoids clustering. Expert. Syst. Appl., 36(2, Part 2):3336–3341, 2009.
– volume: 37
  start-page: 539
  year: 1979
  end-page: 560
  ident: b0130
  article-title: An algorithmic approach to network location problems. ii: The p-medians
  publication-title: SIAM J. Appl. Math.
– volume: 237
  start-page: 590
  year: 2014
  end-page: 605
  ident: b0120
  article-title: An adaptive multiphase approach for large unconditional and conditional p-median problems
  publication-title: Eur. J. Oper. Res.
– volume: 5
  start-page: 207
  year: 1997
  end-page: 226
  ident: b0115
  article-title: Variable neighborhood search for the p-median
  publication-title: Locat. Sci.
– volume: 40
  start-page: 289
  year: 2018
  end-page: 303
  ident: b0180
  article-title: Clustering millions of faces by identity
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 13
  start-page: 1626
  year: 2018
  end-page: 1640
  ident: b0215
  article-title: Face clustering: Representation and pairwise constraints
  publication-title: IEEE Trans. Inf. Forensics Secur.
– volume: 80
  start-page: 1159
  year: 2016
  end-page: 1169
  ident: b0200
  article-title: AGORAS: A fast algorithm for estimating medoids in large datasets
  publication-title: Procedia Comput. Sci.
– reference: Irkutsk supercomputer center of SB RAS. URL:http://hpc.icc.ru/en/. Accessed: 2020-05-19.
– volume: vol. 2
  start-page: 82
  year: 1974
  end-page: 114
  ident: b0100
  article-title: Lagrangean relaxation for integer programming
  publication-title: Approaches to Integer Programming
– start-page: 368
  year: 2010
  end-page: 381
  ident: b0075
  article-title: Building rome on a cloudless day
  publication-title: Computer Vision - ECCV 2010
– volume: 23
  start-page: 546
  year: 2011
  end-page: 556
  ident: b0085
  article-title: Solving large p-median problems with a radius formulation
  publication-title: INFORMS J. Comput.
– volume: 65
  start-page: 383
  year: 1993
  end-page: 399
  ident: b0050
  article-title: Lagrangean heuristics for location problems
  publication-title: Eur. J. Oper. Res.
– start-page: 85
  year: 2020
  end-page: 108
  ident: b0225
  article-title: Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment, chapter An Efficient K-Medoids Clustering Algorithm for Large Scale Data
– year: 2019
  ident: b0170
  article-title: On solving large p-median problems
  publication-title: Environ. Plan. B - Plan. Des.
– reference: H. Song, J.-G. Lee, W.-S. Han. Pamae: Parallel k-medoids clustering with high accuracy and efficiency, in: Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, pages 1087–1096, New York, 2017. ACM.
– volume: 109
  start-page: 89
  year: 2007
  end-page: 114
  ident: b0035
  article-title: Computational study of large-scale p-median problems
  publication-title: Math. Program.
– start-page: 181
  year: 2005
  end-page: 189
  ident: b0245
  article-title: A new and efficient k-medoid algorithm for spatial clustering
  publication-title: Computational Science and Its Applications – ICCSA 2005
– start-page: 573
  year: 2014
  end-page: 577
  ident: b0250
  article-title: K-medoids clustering based on mapreduce and optimal search of medoids
  publication-title: 2014 9th International Conference on Computer Science Education
– reference: P. Avella, A. Sassano, I. Vasil’ev. A heuristic for large-scale p-median instances. Electron. Notes in Discrete Math., 13(0):14–17, 2003. 2nd Cologne-Twente Workshop on Graphs and Combinatorial Optimization.
– start-page: 158
  year: 2018
  end-page: 165
  ident: b0150
  article-title: IARPA Janus Benchmark - C: Face dataset and protocol
  publication-title: Proc. 2018 International Conference on Biometrics (ICB)
– volume: 315
  start-page: 972
  year: 2007
  end-page: 976
  ident: b0080
  article-title: Clustering by passing messages between data points
  publication-title: Science
– volume: 23
  start-page: 1499
  year: 2016
  end-page: 1503
  ident: b0240
  article-title: Joint face detection and alignment using multitask cascaded convolutional networks
  publication-title: IEEE Signal Process. Lett.
– reference: P. Awasthi, A.S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, R. Ward. Relax, no need to round: Integrality of clustering formulations, in: Proc. the 2015 Conference on Innovations in Theoretical Computer Science, ITCS ’15, pages 191–200, New York, 2015. ACM.
– volume: 39
  start-page: 133
  year: 2013
  end-page: 143
  ident: b0235
  article-title: Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets
  publication-title: Knowl.-Based Syst.
– volume: 31
  start-page: 1648
  year: 2015
  end-page: 1654
  ident: b0045
  article-title: GrammR: graphical representation and modeling of count data with application in metagenomics
  publication-title: Bioinformatics
– volume: 38
  start-page: 764
  year: 2011
  end-page: 775
  ident: b0135
  article-title: Variance enhanced k-medoid clustering
  publication-title: Expert Syst. Appl.
– volume: 15
  start-page: 261
  year: 1964
  end-page: 270
  ident: b0145
  article-title: On the location of supply points to minimize transport costs
  publication-title: Oper. Res. Quart.
– volume: 246
  start-page: 253
  year: 2016
  end-page: 272
  ident: b0205
  article-title: A parallelized lagrangean relaxation approach for the discrete ordered median problem
  publication-title: Ann. Oper. Res.
– reference: A. Arbelaez, L. Quesada. Parallelising the k-medoids clustering problem using space-partitioning, in: M. Helmert and G. Röger, editors, Proc. the 6th Annual Symposium on Combinatorial Search, SoCS 2013, pages 20–28. AAAI, 2013.
– volume: 48
  start-page: 274
  year: 2001
  end-page: 296
  ident: b0125
  article-title: Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation
  publication-title: J. ACM
– reference: F. Garcia-López, B. Melián-Batista, J.A. Moreno-Pérez, J.M. Moreno-Vega. Parallelization of the scatter search for the p-median problem. Parallel Comput., 29(5):575–589, 2003. Parallel computing in logistics.
– reference: Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman. VGGFace2: A dataset for recognising faces across pose and age, in Proc. 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, pages 67–74. IEEE, 2018.
– start-page: 41.1
  year: 2015
  end-page: 41.12
  ident: b0190
  publication-title: Deep face recognition Proc. the British Machine Vision Conference (BMVC)
– volume: 10
  start-page: 293
  year: 2004
  end-page: 314
  ident: b0065
  article-title: Cooperative parallel variable neighborhood search for the p-median
  publication-title: J. Heuristics
– volume: 33
  start-page: 544
  year: 2004
  end-page: 562
  ident: b0015
  article-title: Local search heuristics for k-median and facility location problems
  publication-title: SIAM J. Comput.
– volume: 12
  start-page: 447
  year: 2006
  end-page: 466
  ident: b0210
  article-title: A genetic k-medoids clustering algorithm
  publication-title: J. Heuristics
– start-page: 1
  year: 2017
  end-page: 6
  ident: b0175
  article-title: Automated labeling of unknown contracts in ethereum
  publication-title: Proc. 26th International Conference on Computer Communication and Networks
– volume: 19
  start-page: 351
  year: 2009
  end-page: 375
  ident: b0110
  article-title: Solving large p-median clustering problems by primal-dual variable neighborhood search
  publication-title: Data Min. Knowl. Discov.
– start-page: 2049
  year: 2013
  end-page: 2057
  ident: b0160
  article-title: Distributed submodular maximization: Identifying representative elements in massive data
  publication-title: Proc. 26th International Conference on Neural Information Processing Systems, volume 2 of NIPS’13
– volume: 13
  start-page: 182
  year: 1984
  end-page: 196
  ident: b0155
  article-title: On the complexity of some common geometric location problems
  publication-title: SIAM J. Comput.
– reference: A. Ene, S. Im, B. Moseley. Fast clustering using MapReduce, in: Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pages 681–689, New York, 2011. ACM.
– volume: 15
  start-page: 597
  year: 2008
  end-page: 615
  ident: b0025
  article-title: An effective heuristic for large-scale capacitated facility location problems
  publication-title: J. Heuristics
– volume: 33
  start-page: 568
  year: 2011
  end-page: 586
  ident: b0060
  article-title: Parallel spectral clustering in distributed systems
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 92
  start-page: 464
  year: 2018
  end-page: 473
  ident: b0230
  article-title: An improved k-medoids algorithm based on step increasing and optimizing medoids
  publication-title: Expert Syst. Appl.
– volume: 8
  start-page: 375
  year: 2002
  end-page: 388
  ident: b0090
  article-title: The parallel variable neighborhood search for the p-median problem
  publication-title: J. Heuristics
– volume: 2
  start-page: 221
  year: 2011
  end-page: 236
  ident: b0195
  article-title: Using pivots to speed-up k-medoids clustering
  publication-title: JIDM
– reference: Y. Gong, M. Pawlowski, F. Yang, L. Brandy, L. Boundev, R. Fergus. Web scale photo hash clustering on a single machine, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 19–27, June 2015.
– year: 2008
  ident: b0140
  article-title: Introduction to Information Retrieval
– volume: 179
  start-page: 927
  year: 2007
  end-page: 939
  ident: b0165
  article-title: The p-median problem: A survey of metaheuristic approaches
  publication-title: Eur. J. Oper. Res.
– volume: 8
  start-page: 375
  issue: 3
  year: 2002
  ident: 10.1016/j.ins.2020.08.121_b0090
  article-title: The parallel variable neighborhood search for the p-median problem
  publication-title: J. Heuristics
  doi: 10.1023/A:1015013919497
– ident: 10.1016/j.ins.2020.08.121_b0105
  doi: 10.1109/CVPR.2015.7298596
– ident: 10.1016/j.ins.2020.08.121_b0095
  doi: 10.1016/S0167-8191(03)00043-7
– start-page: 85
  year: 2020
  ident: 10.1016/j.ins.2020.08.121_b0225
– volume: 23
  start-page: 1499
  issue: 10
  year: 2016
  ident: 10.1016/j.ins.2020.08.121_b0240
  article-title: Joint face detection and alignment using multitask cascaded convolutional networks
  publication-title: IEEE Signal Process. Lett.
  doi: 10.1109/LSP.2016.2603342
– volume: 65
  start-page: 383
  issue: 3
  year: 1993
  ident: 10.1016/j.ins.2020.08.121_b0050
  article-title: Lagrangean heuristics for location problems
  publication-title: Eur. J. Oper. Res.
  doi: 10.1016/0377-2217(93)90118-7
– start-page: 158
  year: 2018
  ident: 10.1016/j.ins.2020.08.121_b0150
  article-title: IARPA Janus Benchmark - C: Face dataset and protocol
– volume: 237
  start-page: 590
  issue: 2
  year: 2014
  ident: 10.1016/j.ins.2020.08.121_b0120
  article-title: An adaptive multiphase approach for large unconditional and conditional p-median problems
  publication-title: Eur. J. Oper. Res.
  doi: 10.1016/j.ejor.2014.01.050
– ident: 10.1016/j.ins.2020.08.121_b0005
– volume: 23
  start-page: 546
  issue: 4
  year: 2011
  ident: 10.1016/j.ins.2020.08.121_b0085
  article-title: Solving large p-median problems with a radius formulation
  publication-title: INFORMS J. Comput.
  doi: 10.1287/ijoc.1100.0418
– volume: 179
  start-page: 927
  issue: 3
  year: 2007
  ident: 10.1016/j.ins.2020.08.121_b0165
  article-title: The p-median problem: A survey of metaheuristic approaches
  publication-title: Eur. J. Oper. Res.
  doi: 10.1016/j.ejor.2005.05.034
– start-page: 1
  year: 2017
  ident: 10.1016/j.ins.2020.08.121_b0175
  article-title: Automated labeling of unknown contracts in ethereum
– ident: 10.1016/j.ins.2020.08.121_b0070
  doi: 10.1145/2020408.2020515
– volume: 246
  start-page: 253
  issue: 1
  year: 2016
  ident: 10.1016/j.ins.2020.08.121_b0205
  article-title: A parallelized lagrangean relaxation approach for the discrete ordered median problem
  publication-title: Ann. Oper. Res.
  doi: 10.1007/s10479-014-1744-x
– volume: 39
  start-page: 133
  year: 2013
  ident: 10.1016/j.ins.2020.08.121_b0235
  article-title: Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets
  publication-title: Knowl.-Based Syst.
  doi: 10.1016/j.knosys.2012.10.012
– volume: 92
  start-page: 464
  year: 2018
  ident: 10.1016/j.ins.2020.08.121_b0230
  article-title: An improved k-medoids algorithm based on step increasing and optimizing medoids
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2017.09.052
– volume: 39
  start-page: 1625
  issue: 7
  year: 2012
  ident: 10.1016/j.ins.2020.08.121_b0020
  article-title: An aggregation heuristic for large scale p-median problem
  publication-title: Comput. Oper. Res.
  doi: 10.1016/j.cor.2011.09.016
– volume: 315
  start-page: 972
  issue: 5814
  year: 2007
  ident: 10.1016/j.ins.2020.08.121_b0080
  article-title: Clustering by passing messages between data points
  publication-title: Science
  doi: 10.1126/science.1136800
– start-page: 181
  year: 2005
  ident: 10.1016/j.ins.2020.08.121_b0245
  article-title: A new and efficient k-medoid algorithm for spatial clustering
– volume: vol. 2
  start-page: 82
  year: 1974
  ident: 10.1016/j.ins.2020.08.121_b0100
  article-title: Lagrangean relaxation for integer programming
– start-page: 41.1
  year: 2015
  ident: 10.1016/j.ins.2020.08.121_b0190
– start-page: 2049
  year: 2013
  ident: 10.1016/j.ins.2020.08.121_b0160
  article-title: Distributed submodular maximization: Identifying representative elements in massive data
– volume: 33
  start-page: 544
  issue: 3
  year: 2004
  ident: 10.1016/j.ins.2020.08.121_b0015
  article-title: Local search heuristics for k-median and facility location problems
  publication-title: SIAM J. Comput.
  doi: 10.1137/S0097539702416402
– volume: 10
  start-page: 293
  issue: 3
  year: 2004
  ident: 10.1016/j.ins.2020.08.121_b0065
  article-title: Cooperative parallel variable neighborhood search for the p-median
  publication-title: J. Heuristics
  doi: 10.1023/B:HEUR.0000026897.40171.1a
– volume: 5
  start-page: 207
  issue: 4
  year: 1997
  ident: 10.1016/j.ins.2020.08.121_b0115
  article-title: Variable neighborhood search for the p-median
  publication-title: Locat. Sci.
  doi: 10.1016/S0966-8349(98)00030-8
– volume: 38
  start-page: 764
  issue: 1
  year: 2011
  ident: 10.1016/j.ins.2020.08.121_b0135
  article-title: Variance enhanced k-medoid clustering
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2010.07.030
– volume: 80
  start-page: 1159
  year: 2016
  ident: 10.1016/j.ins.2020.08.121_b0200
  article-title: AGORAS: A fast algorithm for estimating medoids in large datasets
  publication-title: Procedia Comput. Sci.
  doi: 10.1016/j.procs.2016.05.446
– ident: 10.1016/j.ins.2020.08.121_b0220
  doi: 10.1145/3097983.3098098
– year: 2019
  ident: 10.1016/j.ins.2020.08.121_b0170
  article-title: On solving large p-median problems
  publication-title: Environ. Plan. B - Plan. Des.
– volume: 2
  start-page: 221
  issue: 2
  year: 2011
  ident: 10.1016/j.ins.2020.08.121_b0195
  article-title: Using pivots to speed-up k-medoids clustering
  publication-title: JIDM
– volume: 13
  start-page: 182
  issue: 1
  year: 1984
  ident: 10.1016/j.ins.2020.08.121_b0155
  article-title: On the complexity of some common geometric location problems
  publication-title: SIAM J. Comput.
  doi: 10.1137/0213014
– volume: 13
  start-page: 1626
  issue: 7
  year: 2018
  ident: 10.1016/j.ins.2020.08.121_b0215
  article-title: Face clustering: Representation and pairwise constraints
  publication-title: IEEE Trans. Inf. Forensics Secur.
  doi: 10.1109/TIFS.2018.2796999
– start-page: 368
  year: 2010
  ident: 10.1016/j.ins.2020.08.121_b0075
  article-title: Building rome on a cloudless day
– ident: 10.1016/j.ins.2020.08.121_b0040
  doi: 10.1145/2688073.2688116
– volume: 15
  start-page: 261
  issue: 3
  year: 1964
  ident: 10.1016/j.ins.2020.08.121_b0145
  article-title: On the location of supply points to minimize transport costs
  publication-title: Oper. Res. Quart.
  doi: 10.1057/jors.1964.47
– start-page: 573
  year: 2014
  ident: 10.1016/j.ins.2020.08.121_b0250
  article-title: K-medoids clustering based on mapreduce and optimal search of medoids
– volume: 48
  start-page: 274
  issue: 2
  year: 2001
  ident: 10.1016/j.ins.2020.08.121_b0125
  article-title: Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation
  publication-title: J. ACM
  doi: 10.1145/375827.375845
– ident: 10.1016/j.ins.2020.08.121_b0030
  doi: 10.1016/S1571-0653(04)00427-5
– volume: 31
  start-page: 1648
  issue: 10
  year: 2015
  ident: 10.1016/j.ins.2020.08.121_b0045
  article-title: GrammR: graphical representation and modeling of count data with application in metagenomics
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv032
– volume: 33
  start-page: 568
  issue: 3
  year: 2011
  ident: 10.1016/j.ins.2020.08.121_b0060
  article-title: Parallel spectral clustering in distributed systems
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2010.88
– volume: 37
  start-page: 539
  issue: 3
  year: 1979
  ident: 10.1016/j.ins.2020.08.121_b0130
  article-title: An algorithmic approach to network location problems. ii: The p-medians
  publication-title: SIAM J. Appl. Math.
  doi: 10.1137/0137041
– volume: 109
  start-page: 89
  issue: 1
  year: 2007
  ident: 10.1016/j.ins.2020.08.121_b0035
  article-title: Computational study of large-scale p-median problems
  publication-title: Math. Program.
  doi: 10.1007/s10107-005-0700-6
– volume: 12
  start-page: 447
  issue: 6
  year: 2006
  ident: 10.1016/j.ins.2020.08.121_b0210
  article-title: A genetic k-medoids clustering algorithm
  publication-title: J. Heuristics
  doi: 10.1007/s10732-006-7284-z
– ident: 10.1016/j.ins.2020.08.121_b0185
  doi: 10.1016/j.eswa.2008.01.039
– ident: 10.1016/j.ins.2020.08.121_b0010
  doi: 10.1609/socs.v4i1.18282
– ident: 10.1016/j.ins.2020.08.121_b0055
  doi: 10.1109/FG.2018.00020
– volume: 40
  start-page: 289
  issue: 2
  year: 2018
  ident: 10.1016/j.ins.2020.08.121_b0180
  article-title: Clustering millions of faces by identity
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2017.2679100
– year: 2008
  ident: 10.1016/j.ins.2020.08.121_b0140
– volume: 15
  start-page: 597
  issue: 6
  year: 2008
  ident: 10.1016/j.ins.2020.08.121_b0025
  article-title: An effective heuristic for large-scale capacitated facility location problems
  publication-title: J. Heuristics
  doi: 10.1007/s10732-008-9078-y
– volume: 19
  start-page: 351
  issue: 3
  year: 2009
  ident: 10.1016/j.ins.2020.08.121_b0110
  article-title: Solving large p-median clustering problems by primal-dual variable neighborhood search
  publication-title: Data Min. Knowl. Discov.
  doi: 10.1007/s10618-009-0135-4
SSID ssj0004766
Score 2.5496771
Snippet The k-medoids (k-median) problem is one of the best known unsupervised clustering problems. Due to its complexity, finding high-quality solutions for...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 344
SubjectTerms CLARA
K-medoids clustering
MPI
Nearest neighbors
P-median problem
Parallel and distributed computing
Title Near-optimal large-scale k-medoids clustering
URI https://dx.doi.org/10.1016/j.ins.2020.08.121
Volume 545
WOSCitedRecordID wos000592406000021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NT9swFLcYcNgOiI8hGGPKAe2wylPs1HF8RBNoTKjiAFVvkeM6W0tJUFMQ_Pc8x3YaYCBA4hJVSdy07-e8D_u930NoL5SZ4YmSWEimcTdT8M4pCFaY4izOY3DQ69qq_jHv9ZLBQJy4znRV3U6AF0VycyMu3xVqOAdgm9LZV8DdfCmcgM8AOhwBdji-CPge_CFcgia4AOlPTKI3rgAI3TnHYPnK0bDqqMmV4UfwVmvss9mbSsaOM4yNw31W_ZPn5bWjG4Ab-j_9pb6sRpNbXV87-ltO2-sIlNSpx_N1xKbA5V7-pfEmsQk7rLmwOjLhFMfUNtnySpR1WUsNRpbT0VnUyOrbR8rarhuMIcIwvOk0NFyqxNZLP-DANlvKdfRDDb0OPP8DWqKcCVBjS_tHB4M_81JYbren_e_2G9l1St-DB_3fFWm5F6eraMXFBcG-xXMNLehiHX1qsUWuo11XYxJ8D1pQBU47byDcRj5oIR80yAdz5D-js8OD01-_seuGgRUVfIZZBMpZ6jykXJJYRiqRCl4yNlR5LigHA0qGxnsXXISZSKTWSaQUIZk29izLok20WJSF3kIBuH3S-NEiT3g3FyTjjOdxFA-JBJVM8m0UesGkylHFm44lk9TnBI5TkGVqZJmGSQqy3EY_miGXlifluZu7Xtqpm8_WgUtBFk8P-_K2YTvo43y-f0WLs-mV3kXL6no2qqbf3AS6A_Oec0A
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Near-optimal+large-scale+k-medoids+clustering&rft.jtitle=Information+sciences&rft.au=Ushakov%2C+Anton+V.&rft.au=Vasilyev%2C+Igor&rft.date=2021-02-04&rft.pub=Elsevier+Inc&rft.issn=0020-0255&rft.eissn=1872-6291&rft.volume=545&rft.spage=344&rft.epage=362&rft_id=info:doi/10.1016%2Fj.ins.2020.08.121&rft.externalDocID=S0020025520308872
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0255&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0255&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0255&client=summon