A Complete Linkage Algorithm for Clustering Dynamic Datasets

In recent years, a vital challenge faced by experts in data science is analyzing the gigantic volume of data coming at high speed. This data avalanche is not only difficult to collect but also demands high time and memory while getting processed. Clustering is a well-known solution to this problem a...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the National Academy of Sciences, India, Section A, physical sciences Vol. 94; no. 5; pp. 471 - 486
Main Authors: Banerjee, Payel, Chakrabarti, Amlan, Ballabh, Tapas Kumar
Format: Journal Article
Language:English
Published: New Delhi Springer India 01.11.2024
Subjects:
ISSN:0369-8203, 2250-1762
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, a vital challenge faced by experts in data science is analyzing the gigantic volume of data coming at high speed. This data avalanche is not only difficult to collect but also demands high time and memory while getting processed. Clustering is a well-known solution to this problem as it not only helps in shrinking the database but also helps in gaining valuable insights from a completely unlabelled dataset. Complete Linkage Clustering is a well-known Hierarchical Clustering algorithm suitable for generating small and highly cohesive clusters but suffers from the disadvantage of high convergence time. The traditional methods require the complete dataset in advance to take a clustering decision which makes it unsuitable for clustering both large and dynamic datasets where new data points are added frequently. This is because, for every addition of data, the entire dataset will be processed again for taking a clustering decision. Our paper presents a fast Complete Linkage Clustering algorithm that uses triangle inequality to avoid a lot of redundant distance calculations making the algorithm faster and suitable for clustering both large and dynamic databases. Experiments have been conducted with various real-world datasets and Adjusted Rand Index has been used for comparing the result with the original Complete Linkage algorithm. The experimental result confirms the effectiveness of our algorithm for both static and dynamic databases.
ISSN:0369-8203
2250-1762
DOI:10.1007/s40010-024-00894-8