From A-to-Z Review of Clustering Validation Indices

Data clustering involves identifying latent similarities within a dataset and organizing them into clusters or groups. The outcomes of various clustering algorithms differ as they are susceptible to the intrinsic characteristics of the original dataset, including noise and dimensionality. The effect...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org
Main Authors: Hassan, Bryar A, Noor, Bahjat Tay, Hassan, Alla A, Ahmed, Aram M, Rashid, Tarik A, Abdalla, Naz N
Format: Paper
Language:English
Published: Ithaca Cornell University Library, arXiv.org 18.07.2024
Subjects:
ISSN:2331-8422
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data clustering involves identifying latent similarities within a dataset and organizing them into clusters or groups. The outcomes of various clustering algorithms differ as they are susceptible to the intrinsic characteristics of the original dataset, including noise and dimensionality. The effectiveness of such clustering procedures directly impacts the homogeneity of clusters, underscoring the significance of evaluating algorithmic outcomes. Consequently, the assessment of clustering quality presents a significant and complex endeavor. A pivotal aspect affecting clustering validation is the cluster validity metric, which aids in determining the optimal number of clusters. The main goal of this study is to comprehensively review and explain the mathematical operation of internal and external cluster validity indices, but not all, to categorize these indices and to brainstorm suggestions for future advancement of clustering validation research. In addition, we review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms, such as the evolutionary clustering algorithm star (ECA*). Finally, we suggest a classification framework for examining the functionality of both internal and external clustering validation measures regarding their ideal values, user-friendliness, responsiveness to input data, and appropriateness across various fields. This classification aids researchers in selecting the appropriate clustering validation measure to suit their specific requirements.
Bibliography:SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
ISSN:2331-8422
DOI:10.48550/arxiv.2407.20246