MCS: A Method for Finding the Number of Clusters

This paper proposes a maximum clustering similarity (MCS) method for determining the number of clusters in a data set by studying the behavior of similarity indices comparing two (of several) clustering methods. The similarity between the two clusterings is calculated at the same number of clusters,...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of classification Ročník 28; číslo 2; s. 184 - 209
Hlavní autoři: Albatineh, Ahmed N., Niewiadomska-Bugaj, Magdalena
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer-Verlag 01.07.2011
Springer
Springer Nature B.V
Témata:
ISSN:0176-4268, 1432-1343
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper proposes a maximum clustering similarity (MCS) method for determining the number of clusters in a data set by studying the behavior of similarity indices comparing two (of several) clustering methods. The similarity between the two clusterings is calculated at the same number of clusters, using the indices of Rand (R), Fowlkes and Mallows (FM), and Kulczynski (K) each corrected for chance agreement. The number of clusters at which the index attains its maximum is a candidate for the optimal number of clusters. The proposed method is applied to simulated bivariate normal data, and further extended for use in circular data. Its performance is compared to the criteria discussed in Tibshirani, Walther, and Hastie (2001). The proposed method is not based on any distributional or data assumption which makes it widely applicable to any type of data that can be clustered using at least two clustering algorithms.
Bibliografie:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0176-4268
1432-1343
DOI:10.1007/s00357-010-9069-1