k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

We analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k -means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). Tha...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of classification Ročník 38; číslo 2; s. 313 - 352
Hlavní autor:	Młodak, Andrzej
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Springer US 01.07.2021 Springer Nature B.V
Témata:	Algorithms Bioinformatics Clustering Constraints Empirical analysis Heterogeneity Homogeneity Indexes Marketing Mathematics and Statistics Neighborhoods Operations research Original Research Pattern Recognition Psychometrics Quality assessment Signal,Image and Speech Processing Simulation Site selection Statistical Theory and Methods Statistics Peirce index Probabilistic Contiguity constraint Sokal and Sneath index Means method clustering Silhouette index Ward method Rand index Correctness of clustering
ISSN:	0176-4268, 1432-1343
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	We analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k -means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). That is, some special two-stage algorithms being the kinds of clustering with relational constraint are proposed. They optimize division of set of objects into clusters respecting the requirement that neighbours have to belong to the same cluster. In the case of the probabilistic d -clustering, relevant modification of its target function is suggested and studied. Versatile simulation study and empirical analysis verify the practical efficiency of these methods. The quality of clustering is assessed on the basis of indices of homogeneity, heterogeneity and correctness of clusters as well as the silhouette index. Using these tools and similarity indices (Rand, Peirce and Sokal and Sneath), it was shown that the probabilistic d -clustering can produce better results than Ward’s algorithm. In comparison with the k -means approach, the probabilistic d -clustering—although gives rather similar results—is more robust to creation of trivial (of which empty) clusters and produces less diversified (in replications, in terms of correctness) results than k -means approach, i.e. is more predictable from the point of view of the clustering quality.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0176-4268 1432-1343
DOI:	10.1007/s00357-020-09370-5