A Mathematical Theory for Clustering in Metric Spaces

Clustering is one of the most fundamental problems in data analysis and it has been studied extensively in the literature. Though many clustering algorithms have been proposed, clustering theories that justify the use of these clustering algorithms are still unsatisfactory. In particular, one of the...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on network science and engineering Ročník 3; číslo 1; s. 2 - 16
Hlavní autoři:	Chang, Cheng-Shang, Liao, Wanjiun, Chen, Yu-Sheng, Liou, Li-Heng
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Piscataway IEEE 01.01.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithm design and analysis Algorithms Clustering Clustering algorithms convergence duality Extraterrestrial measurements hierarchical algorithms K-sets Kernel Minimization partitional algorithms Partitioning algorithms K-sets hierarchical algorithms Clustering convergence duality partitional algorithms
ISSN:	2327-4697, 2334-329X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Clustering is one of the most fundamental problems in data analysis and it has been studied extensively in the literature. Though many clustering algorithms have been proposed, clustering theories that justify the use of these clustering algorithms are still unsatisfactory. In particular, one of the fundamental challenges is to address the following question: What is a cluster in a set of data points? In this paper, we make an attempt to address such a question by considering a set of data points associated with a distance measure (metric). We first propose a new cohesion measure in terms of the distance measure. Using the cohesion measure, we define a cluster as a set of points that are cohesive to themselves. For such a definition, we show there are various equivalent statements that have intuitive explanations. We then consider the second question: How do we find clusters and good partitions of clusters under such a definition? For such a question, we propose a hierarchical agglomerative algorithm and a partitional algorithm. Unlike standard hierarchical agglomerative algorithms, our hierarchical agglomerative algorithm has a specific stopping criterion and it stops with a partition of clusters. Our partitional algorithm, called the <inline-formula><tex-math notation="LaTeX">K</tex-math> <inline-graphic xlink:type="simple" xlink:href="chang-ieq1-2516339.gif"/> </inline-formula>-sets algorithm in the paper, appears to be a new iterative algorithm. Unlike the Lloyd iteration that needs two-step minimization, our <inline-formula><tex-math notation="LaTeX">K</tex-math> <inline-graphic xlink:type="simple" xlink:href="chang-ieq2-2516339.gif"/> </inline-formula>-sets algorithm only takes one-step minimization. One of the most interesting findings of our paper is the duality result between a distance measure and a cohesion measure. Such a duality result leads to a dual <inline-formula><tex-math notation="LaTeX">K </tex-math> <inline-graphic xlink:type="simple" xlink:href="chang-ieq3-2516339.gif"/> </inline-formula>-sets algorithm for clustering a set of data points with a cohesion measure. The dual <inline-formula> <tex-math notation="LaTeX">K</tex-math> <inline-graphic xlink:type="simple" xlink:href="chang-ieq4-2516339.gif"/> </inline-formula>-sets algorithm converges in the same way as a sequential version of the classical kernel <inline-formula><tex-math notation="LaTeX">K</tex-math> <inline-graphic xlink:type="simple" xlink:href="chang-ieq5-2516339.gif"/> </inline-formula>-means algorithm. The key difference is that a cohesion measure does not need to be positive semi-definite.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2327-4697 2334-329X
DOI:	10.1109/TNSE.2016.2516339