A Topological-Indicators-Based k-Means Clustering Algorithm and Its Application in Time Series Data: A Case Study on Sea Level Variability in Peninsular Malaysia

Traditional k-means clustering is widely used to analyze regional and temporal variations in time series data, such as sea levels. However, its accuracy can be affected by limitations, particularly when applied to datasets with mixed groups or significant noise. In this study, we analyzed monthly se...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access Vol. 13; pp. 46514 - 46533
Main Authors:	Lin, Zixin, Zulkepli, Nur Fariha Syaqina, Bin Mohd Kasihmuddin, Mohd Shareduwan, Gobithaasan, Rudrusamyr
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Analytical models Biological system modeling Cluster analysis Clustering Clustering algorithms computational topology Data analysis Data points Datasets environmental sustainability Homology Machine learning Noise Point cloud compression Sea level sea level variability Sea measurements Tides Time series Time series analysis Topology Vector quantization Vectors Malaysia
ISSN:	2169-3536, 2169-3536
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Traditional k-means clustering is widely used to analyze regional and temporal variations in time series data, such as sea levels. However, its accuracy can be affected by limitations, particularly when applied to datasets with mixed groups or significant noise. In this study, we analyzed monthly sea level data derived from daily time series at 14 tide gauge stations along the coastline of Peninsular Malaysia. To enhance traditional k-means clustering, we propose a hybrid approach that combines clustering techniques with topological data analysis (TDA). Specifically, we integrate k-means and its variant, k-means++, with persistent homology, the primary tool in TDA, to capture topological insights from the datasets. The proposed approach clusters the 14 tide gauge stations based on predefined topological features, and the probability of data points from each station belonging to specific clusters is computed. The results demonstrate that our approach significantly improves the performance of traditional k-means clustering by incorporating topological information, compared to using clustering without such insights.
Bibliography:	ObjectType-Case Study-2 SourceType-Scholarly Journals-1 content type line 14 ObjectType-Feature-4 ObjectType-Report-1 ObjectType-Article-3
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3548558