View in EDS

Comparing Two Clusterings Using Matchings between Clusters of Clusters

Saved in:

Bibliographic Details
Title:	Comparing Two Clusterings Using Matchings between Clusters of Clusters
Authors:	Cazals, Frédéric, Mazauric, Dorian, Tetley, Romain, Watrigant, Rémi
Contributors:	Cazals, Frederic
Source:	ACM Journal of Experimental Algorithmics. 24:1-41
Publisher Information:	Association for Computing Machinery (ACM), 2019.
Publication Year:	2019
Subject Terms:	NP-complétude, [INFO.INFO-CG] Computer Science [cs]/Computational Geometry [cs.CG], Stabilité du clustering, Clustering stability, Comparison of clusterings, Graph decomposition, Dynamic programming algorithms, Décompositions de graphes, Comparaison de clusterings, 0102 computer and information sciences, 01 natural sciences, NP-completeness, Programmation dynamique
Description:	Clustering is a fundamental problem in data science, yet the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory, and various indices (Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methods computing meta-clusters within each clustering—a meta-cluster is a group of clusters, together with a matching between these. Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the nonempty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D -family-matching problem on intersection graphs, with D the upper bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP -completeness and APX -hardness results, and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then we prove spanning tree–based efficient heuristic algorithms for general graphs. Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information.
Document Type:	Article
Language:	English
ISSN:	1084-6654
DOI:	10.1145/3345951
Access URL:	https://hal.inria.fr/hal-01514872/file/RR-9063-family-matching.pdf https://inria.hal.science/hal-02425599v1 https://doi.org/10.1145/3345951 https://dblp.uni-trier.de/db/journals/jea/jea24.html#CazalsMTW19 https://hal-lara.archives-ouvertes.fr/hal-01514872v2 https://hal.inria.fr/hal-01514872/document https://hal.archives-ouvertes.fr/hal-01514872v1 https://hal.inria.fr/hal-01514872v1 https://dl.acm.org/doi/abs/10.1145/3345951
Rights:	URL: https://www.acm.org/publications/policies/copyright_policy#Background
Accession Number:	edsair.doi.dedup.....726d05f7f18f13c112cc98351a772c0c
Database:	OpenAIRE

View record at OpenAIRE

Full Text Finder

Nájsť tento článok vo Web of Science

Description
Abstract:	Clustering is a fundamental problem in data science, yet the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory, and various indices (Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methods computing meta-clusters within each clustering—a meta-cluster is a group of clusters, together with a matching between these. Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the nonempty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D -family-matching problem on intersection graphs, with D the upper bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP -completeness and APX -hardness results, and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then we prove spanning tree–based efficient heuristic algorithms for general graphs. Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information.
ISSN:	10846654
DOI:	10.1145/3345951