Zobraziť v EDS

Comparing two clusterings using matchings between clusters of clusters

Uložené v:

Podrobná bibliografia
Názov:	Comparing two clusterings using matchings between clusters of clusters
Autori:	Cazals, Frédéric, Mazauric, Dorian, Tetley, Romain, Watrigant, Rémi
Prispievatelia:	Algorithms, Biology, Structure (ABS), Centre Inria d'Université Côte d'Azur, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), INRIA Sophia Antipolis - Méditerranée, Universite Cote d'Azur
Zdroj:	https://inria.hal.science/hal-01514872 ; [Research Report] RR-9063, INRIA Sophia Antipolis - Méditerranée; Universite Cote d'Azur. 2017, pp.1-45.
Informácie o vydavateľovi:	CCSD
Rok vydania:	2017
Zbierka:	HAL Université Côte d'Azur
Predmety:	comparison of clusterings, Clustering stability, NP-completeness, graph decomposition, dynamic programming algorithms, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
Popis:	Clustering is a fundamental problem in data science, yet, the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory and various indices (Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methods computing meta-clusters within each clustering– a meta-cluster is a group of clusters, together with a matching between these.Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the non empty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D-family-matching problem on intersection graphs, with D the upper-bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP-completeness results and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then, we prove spanning-tree based efficient algorithms for general graphs.Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information (VI).
Druh dokumentu:	report
Jazyk:	English
Dostupnosť:	https://inria.hal.science/hal-01514872 https://inria.hal.science/hal-01514872v4/document https://inria.hal.science/hal-01514872v4/file/RR-9063-family-matching.pdf
Rights:	info:eu-repo/semantics/OpenAccess
Prístupové číslo:	edsbas.5E06EEEC
Databáza:	BASE

View record from BASE

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	Clustering is a fundamental problem in data science, yet, the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory and various indices (Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methods computing meta-clusters within each clustering– a meta-cluster is a group of clusters, together with a matching between these.Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the non empty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D-family-matching problem on intersection graphs, with D the upper-bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP-completeness results and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then, we prove spanning-tree based efficient algorithms for general graphs.Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information (VI).