Sample-weighted clustering methods

Although there have been many researches on cluster analysis considering feature (or variable) weights, little effort has been made regarding sample weights in clustering. In practice, not every sample in a data set has the same importance in cluster analysis. Therefore, it is interesting to obtain...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computers & mathematics with applications (1987) Ročník 62; číslo 5; s. 2200 - 2208
Hlavní autoři: Yu, Jian, Yang, Miin-Shen, Lee, E. Stanley
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.09.2011
Témata:
ISSN:0898-1221, 1873-7668
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Although there have been many researches on cluster analysis considering feature (or variable) weights, little effort has been made regarding sample weights in clustering. In practice, not every sample in a data set has the same importance in cluster analysis. Therefore, it is interesting to obtain the proper sample weights for clustering a data set. In this paper, we consider a probability distribution over a data set to represent its sample weights. We then apply the maximum entropy principle to automatically compute these sample weights for clustering. Such method can generate the sample-weighted versions of most clustering algorithms, such as k -means, fuzzy c -means (FCM) and expectation & maximization (EM), etc. The proposed sample-weighted clustering algorithms will be robust for data sets with noise and outliers. Furthermore, we also analyze the convergence properties of the proposed algorithms. This study also uses some numerical data and real data sets for demonstration and comparison. Experimental results and comparisons actually demonstrate that the proposed sample-weighted clustering algorithms are effective and robust clustering methods.
Bibliografie:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0898-1221
1873-7668
DOI:10.1016/j.camwa.2011.07.005