A Topological-Indicators-Based k-Means Clustering Algorithm and Its Application in Time Series Data: A Case Study on Sea Level Variability in Peninsular Malaysia

Traditional k-means clustering is widely used to analyze regional and temporal variations in time series data, such as sea levels. However, its accuracy can be affected by limitations, particularly when applied to datasets with mixed groups or significant noise. In this study, we analyzed monthly se...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 13; pp. 46514 - 46533
Main Authors: Lin, Zixin, Zulkepli, Nur Fariha Syaqina, Bin Mohd Kasihmuddin, Mohd Shareduwan, Gobithaasan, Rudrusamyr
Format: Journal Article
Language:English
Published: Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional k-means clustering is widely used to analyze regional and temporal variations in time series data, such as sea levels. However, its accuracy can be affected by limitations, particularly when applied to datasets with mixed groups or significant noise. In this study, we analyzed monthly sea level data derived from daily time series at 14 tide gauge stations along the coastline of Peninsular Malaysia. To enhance traditional k-means clustering, we propose a hybrid approach that combines clustering techniques with topological data analysis (TDA). Specifically, we integrate k-means and its variant, k-means++, with persistent homology, the primary tool in TDA, to capture topological insights from the datasets. The proposed approach clusters the 14 tide gauge stations based on predefined topological features, and the probability of data points from each station belonging to specific clusters is computed. The results demonstrate that our approach significantly improves the performance of traditional k-means clustering by incorporating topological information, compared to using clustering without such insights.
Bibliography:ObjectType-Case Study-2
SourceType-Scholarly Journals-1
content type line 14
ObjectType-Feature-4
ObjectType-Report-1
ObjectType-Article-3
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3548558