An efficient k-means clustering algorithms: Analysis and implementation

In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic f...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on pattern analysis and machine intelligence Ročník 24; číslo 7; s. 881 - 892
Hlavní autoři:	Kanungo, Tapas, Mount, David M, Netanyahu, Nathan S, Piatko, Christine D, Silverman, Ruth, Wu, Angela Y
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	01.07.2002
ISSN:	0162-8828
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
Bibliografie:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0162-8828
DOI:	10.1109/TPAMI.2002.1017616