An efficient k-means clustering algorithms: Analysis and implementation

In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence Jg. 24; H. 7; S. 881 - 892
Hauptverfasser: Kanungo, Tapas, Mount, David M, Netanyahu, Nathan S, Piatko, Christine D, Silverman, Ruth, Wu, Angela Y
Format: Journal Article
Sprache:Englisch
Veröffentlicht: 01.07.2002
ISSN:0162-8828
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
AbstractList In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
Author Piatko, Christine D
Wu, Angela Y
Mount, David M
Netanyahu, Nathan S
Silverman, Ruth
Kanungo, Tapas
Author_xml – sequence: 1
  givenname: Tapas
  surname: Kanungo
  fullname: Kanungo, Tapas
– sequence: 2
  givenname: David
  surname: Mount
  middlename: M
  fullname: Mount, David M
– sequence: 3
  givenname: Nathan
  surname: Netanyahu
  middlename: S
  fullname: Netanyahu, Nathan S
– sequence: 4
  givenname: Christine
  surname: Piatko
  middlename: D
  fullname: Piatko, Christine D
– sequence: 5
  givenname: Ruth
  surname: Silverman
  fullname: Silverman, Ruth
– sequence: 6
  givenname: Angela
  surname: Wu
  middlename: Y
  fullname: Wu, Angela Y
BookMark eNotjrFOwzAUAD0UibbwA0ye2BKeHSeO2aIKSqUiGLJXL45dDI4TYmfg76kE0013ug1ZhTEYQu4Y5IyBemjfm9dDzgF4zoDJilUrsgZW8ayueX1NNjF-AjBRQrEm-yZQY63TzoREv7LBYIhU-yUmM7twpujP4-zSxxAfaRPQ_0QXKYaeumHyZrhYmNwYbsiVRR_N7T-3pH1-ancv2fFtf9g1x2xSKmUFgBYoRQ-1spcZI6SSBgHQItdVKaCvC12ALpWuGYIVQnRWAq-45th1xZbc_2WnefxeTEynwUVtvMdgxiWeuCylgkoUv7TOT-o
ContentType Journal Article
DBID 7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TPAMI.2002.1017616
DatabaseName Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EndPage 892
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
6IK
7SC
8FD
9M8
AAJGR
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
ADRHT
AENEX
AETEA
AGQYO
AHBIQ
AIBXA
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
IEDLZ
IFIPE
IPLJI
JAVBF
JQ2
L7M
LAI
L~C
L~D
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RXW
RZB
TAE
TN5
UHB
XJT
~02
ID FETCH-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3
ISSN 0162-8828
IngestDate Thu Oct 02 05:08:05 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 27579064
PQPubID 23500
PageCount 12
ParticipantIDs proquest_miscellaneous_27579064
PublicationCentury 2000
PublicationDate 20020701
PublicationDateYYYYMMDD 2002-07-01
PublicationDate_xml – month: 07
  year: 2002
  text: 20020701
  day: 01
PublicationDecade 2000
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationYear 2002
SSID ssj0014503
Score 2.3959856
Snippet In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points...
SourceID proquest
SourceType Aggregation Database
StartPage 881
Title An efficient k-means clustering algorithms: Analysis and implementation
URI https://www.proquest.com/docview/27579064
Volume 24
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  issn: 0162-8828
  databaseCode: RIE
  dateStart: 19790101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://ieeexplore.ieee.org/
  omitProxy: false
  ssIdentifier: ssj0014503
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLfK4DAOgxWmbQzmw8RlinDz5ZhbNTZAWrsegtRb5bjOVrV1Sr-0_S_8sTw7dtKpSLADlyiKWqfN--V9-b3fQ-gsCPIoljzzWkEmIECBgDWLYu5RwiFklpwIVg6boN1u0u-zXqPxy_XCrCdUqeT-ns3-q6jhGghbt84-QdzVonABzkHocASxw_GfBN9WukhjZBodz8feVIIxOheTlWZEMB2Jk9tiPlreTRdlUtCSkphNhKmrJq_EZf1WHRPqcRJutrjZZJgZbk5dzbyxxtRUZ0rDQ-G4PiutzhXoFpOcTcFIV-58p7D0B6bCvk7QdiW4rg_8blXaAZ3lr1O1PQDVuKgJEvRNvzxKYtQFr1VeMwbFnNg-cauYy-ZqC0C6oWWTcsqLM9jlML1tW2CoVNNeu_PdlKIYSqm49Qfi7e7N4OrH9fUgveynH2c_PT2TTO_d2wEtz9Bzn0asVXYHVntUYWTmblc_3bVkEfZp-6Zbpt74L-lrtGcDD9wuAbOPGlI10Ss31ANbHd9ELzcYKt-gr22FKzRhiyZcownXaPqMHZYw4AA_xtJblF5dphffPDt7w5sxtvQCQkTIaTgkCcvhv8mQMgpvLuE590UMbv4wCURARMRE0uIkD8MwyykE9L7weZYFB2hHFUoeIgweZpyBlidD-BoTLIuHoU8kBLKEJbDCETp1T2YAqk3vV3Eli9ViAA-dMnCZj__6iXdot0bVCdpZzlfyPXoh1svRYv7BiO03L21yHw
linkProvider IEEE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+efficient+k-means+clustering+algorithms%3A+Analysis+and+implementation&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Kanungo%2C+Tapas&rft.au=Mount%2C+David+M&rft.au=Netanyahu%2C+Nathan+S&rft.au=Piatko%2C+Christine+D&rft.date=2002-07-01&rft.issn=0162-8828&rft.volume=24&rft.issue=7&rft.spage=881&rft.epage=892&rft_id=info:doi/10.1109%2FTPAMI.2002.1017616&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon