MCS: A Method for Finding the Number of Clusters

This paper proposes a maximum clustering similarity (MCS) method for determining the number of clusters in a data set by studying the behavior of similarity indices comparing two (of several) clustering methods. The similarity between the two clusterings is calculated at the same number of clusters,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of classification Jg. 28; H. 2; S. 184 - 209
Hauptverfasser:	Albatineh, Ahmed N., Niewiadomska-Bugaj, Magdalena
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York Springer-Verlag 01.07.2011 Springer Springer Nature B.V
Schlagworte:	Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Bioinformatics Candidates Cluster analysis Clustering Comparisons Computer science; control theory; systems Data Data analysis Exact sciences and technology Indexes Marketing Mathematics Mathematics and Statistics Methods Methods of scientific computing (including symbolic computation, algebraic computation) Multivariate analysis Numerical analysis. Scientific computation Pattern Recognition Probability and statistics Psychometrics Sciences and techniques of general use Signal,Image and Speech Processing Statistical Theory and Methods Statistics Theoretical computing Similarity index Gap statistic Bivariate normal mixture Number of clusters Comparing partitions Circular data Correction for chance agreement Clustering algorithm Mixed distribution Partition Discriminant analysis Data analysis Cluster analysis (statistics) Multivariate analysis Algorithm Statistical method
ISSN:	0176-4268, 1432-1343
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes a maximum clustering similarity (MCS) method for determining the number of clusters in a data set by studying the behavior of similarity indices comparing two (of several) clustering methods. The similarity between the two clusterings is calculated at the same number of clusters, using the indices of Rand (R), Fowlkes and Mallows (FM), and Kulczynski (K) each corrected for chance agreement. The number of clusters at which the index attains its maximum is a candidate for the optimal number of clusters. The proposed method is applied to simulated bivariate normal data, and further extended for use in circular data. Its performance is compared to the criteria discussed in Tibshirani, Walther, and Hastie (2001). The proposed method is not based on any distributional or data assumption which makes it widely applicable to any type of data that can be clustered using at least two clustering algorithms.
Bibliographie:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0176-4268 1432-1343
DOI:	10.1007/s00357-010-9069-1