A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise

•This paper proposes a hierarchical clustering algorithm robust to noise.•It includes a single linkage improvement that involves local density.•It forbids the merging of representative clusters.•Performance is assessed using data with known ground truth. Hierarchical clustering is widely used in dat...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 128; s. 96 - 108
Hlavní autoři: Ros, Frédéric, Guillaume, Serge
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Elsevier Ltd 15.08.2019
Elsevier BV
Elsevier
Témata:
ISSN:0957-4174, 1873-6793
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•This paper proposes a hierarchical clustering algorithm robust to noise.•It includes a single linkage improvement that involves local density.•It forbids the merging of representative clusters.•Performance is assessed using data with known ground truth. Hierarchical clustering is widely used in data mining. The single linkage criterion is powerful, as it allows for handling various shapes and densities, but it is sensitive to noise11A sample code is available at: http://frederic.rosresearch.free.fr/mydata/homepage/.. Two improvements are proposed in this work to deal with noise. First, the single linkage criterion takes into account the local density to make sure the distance involves core points of each group. Second, the hierarchical algorithm forbids the merging of representative clusters, higher than a minimum size, once identified. The experiments include a sensitivity analysis to the parameters and a comparison of the available criteria using datasets known in the literature. The latter proved that local criteria yield better results than global ones. Then, the three single linkage criteria were compared in more challenging situations that highlighted the complementariness between the two levels of improvement: the criterion and the clustering algorithm.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.03.031