Cost-effective and adaptive clustering algorithm for stream processing on cloud system

Clustering is a fundamental operation that plays an essential role in data management and analysis. Clustering algorithms have been well studied over the past two decades, but the real-time clustering has yet to be maturely applied. For applications based on clustering calculations, capturing the dy...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:GeoInformatica Ročník 27; číslo 1; s. 1 - 21
Hlavní autoři: Xia, Yue, Fang, Junhua, Chao, Pingfu, Pan, Zhicheng, Shang, Jedi S.
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.01.2023
Springer
Springer Nature B.V
Témata:
ISSN:1384-6175, 1573-7624
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Clustering is a fundamental operation that plays an essential role in data management and analysis. Clustering algorithms have been well studied over the past two decades, but the real-time clustering has yet to be maturely applied. For applications based on clustering calculations, capturing the dynamic changes of clusters and trends of moving objects in a real-time manner can maximize the value of the data. Although the DSPE ( D istributed S tream P rocessing E ngine) is capable of such workloads, it still faces the problems of fixed window size and computational resources waste. In this paper, we introduce a new C ost- e ffective and A daptive C lustering method ( CeAC ), which can improve computational efficiency while ensuring the accuracy of the clustering result. Specifically, we design a composite window model which contains the latest data records and maintains historical states. To achieve a lightweight clustering, we propose a fully online clustering algorithm based on grid density, which can capture clusters with arbitrary shape and effectively handle outliers in parallel. We further introduce an adaptive calculation model to accelerate the clustering operation by shedding workload according to the incoming data characteristic. Experimental results show that the proposed method is accurate and efficient in real-time data stream clustering.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1384-6175
1573-7624
DOI:10.1007/s10707-021-00442-1