An ensemble agglomerative hierarchical clustering algorithm based on clusters clustering technique and the novel similarity measurement

The advent of architectures such as the Internet of Things (IoT) has led to the dramatic growth of data and the production of big data. Managing this often-unlabeled data is a big challenge for the real world. Hierarchical Clustering (HC) is recognized as an efficient unsupervised approach to unlabe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of King Saud University. Computer and information sciences Jg. 34; H. 6; S. 3828 - 3842
Hauptverfasser: Teng Li, Amin Rezaeipanah, ElSayed M. Tag El Din
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Springer 01.06.2022
Schlagworte:
ISSN:1319-1578
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract The advent of architectures such as the Internet of Things (IoT) has led to the dramatic growth of data and the production of big data. Managing this often-unlabeled data is a big challenge for the real world. Hierarchical Clustering (HC) is recognized as an efficient unsupervised approach to unlabeled data analysis. In data mining, HC is a mechanism for grouping data at different scales by creating a dendrogram. One of the most common HC methods is Agglomerative Hierarchical Clustering (AHC) in which clusters are created bottom-up. In addition, ensemble clustering approaches are used today in complex problems due to the weakness of individual clustering methods. Accordingly, we propose a clustering framework using AHC methods based on ensemble approaches, which includes the clusters clustering technique and a novel similarity measurement. The proposed algorithm is a Meta-Clustering Ensemble scheme based on Model Selection (MCEMS). MCEMS uses the bi-weighting policy to solve the model selection associated problem to improve ensemble clustering. Specifically, multiple AHC individual methods cluster the data from different aspects to form the primary clusters. According to the results of different methods, the similarity between the instances is calculated using a novel similarity measurement. The MCEMS scheme involves the creation of meta-clusters by re-clustering of primary clusters. After clusters clustering, the number of optimal clusters is determined by merging similar clusters and considering a threshold. Finally, the similarity of the instances to the meta-clusters is calculated and each instance is assigned to the meta-cluster with the highest similarity to form the final clusters. Simulations have been performed on some datasets from the UCI repository to evaluate MCEMS scheme compared to state-of-the-art algorithms. Extensive experiments clearly prove the superiority of MCEMS over HMM, DSPA and WHAC algorithms based on Wilcoxon test and Cophenetic correlation coefficient.
AbstractList The advent of architectures such as the Internet of Things (IoT) has led to the dramatic growth of data and the production of big data. Managing this often-unlabeled data is a big challenge for the real world. Hierarchical Clustering (HC) is recognized as an efficient unsupervised approach to unlabeled data analysis. In data mining, HC is a mechanism for grouping data at different scales by creating a dendrogram. One of the most common HC methods is Agglomerative Hierarchical Clustering (AHC) in which clusters are created bottom-up. In addition, ensemble clustering approaches are used today in complex problems due to the weakness of individual clustering methods. Accordingly, we propose a clustering framework using AHC methods based on ensemble approaches, which includes the clusters clustering technique and a novel similarity measurement. The proposed algorithm is a Meta-Clustering Ensemble scheme based on Model Selection (MCEMS). MCEMS uses the bi-weighting policy to solve the model selection associated problem to improve ensemble clustering. Specifically, multiple AHC individual methods cluster the data from different aspects to form the primary clusters. According to the results of different methods, the similarity between the instances is calculated using a novel similarity measurement. The MCEMS scheme involves the creation of meta-clusters by re-clustering of primary clusters. After clusters clustering, the number of optimal clusters is determined by merging similar clusters and considering a threshold. Finally, the similarity of the instances to the meta-clusters is calculated and each instance is assigned to the meta-cluster with the highest similarity to form the final clusters. Simulations have been performed on some datasets from the UCI repository to evaluate MCEMS scheme compared to state-of-the-art algorithms. Extensive experiments clearly prove the superiority of MCEMS over HMM, DSPA and WHAC algorithms based on Wilcoxon test and Cophenetic correlation coefficient.
Author ElSayed M. Tag El Din
Teng Li
Amin Rezaeipanah
Author_xml – sequence: 1
  fullname: Teng Li
  organization: Artificial Intelligence and Big Data College, Chongqing College of Electronic Engineering, Chongqing 401331, China; Corresponding authors
– sequence: 2
  fullname: Amin Rezaeipanah
  organization: Department of Computer Engineering, Persian Gulf University, Bushehr, Iran; Corresponding authors
– sequence: 3
  fullname: ElSayed M. Tag El Din
  organization: Electrical Engineering Department, Faculty of Engineering and Technology, Future University in Egypt, New Cairo 11845, Egypt
BookMark eNpNjEtOwzAURT0oEqV0Bwy8gQQ7Tpx4WFX8pEpMYBw9Oy-Jg-OAnVTqCtg2FR-J0b265-pckZWfPBJyw1nKGZe3Qzq8xcXYNGNZlrI8ZZytyJoLrhJelNUl2cY4MMZ4KYtcyDX53HmKPuKoHVLoOjeNGGC2R6S9PbdgemvAUeOWOGOwvqPguinYuR-phogNnfwfjf9vM5re24_lrPUNnXukfjqio9GO1sFZcKIjQlwCjujna3LRgou4_c0Neb2_e9k_Jofnh6f97pCYnFdzUnEutRGqkAKqopWqRN1mhgNDVYFCxRrVZEI3TGUoeKHbHNpSybxkDHguxYY8_XibCYb6PdgRwqmewNbfwxS6GsJsjcNaFACtyFummyyvtNaSQQaVlGUrSlGV4gvgDXaz
CitedBy_id crossref_primary_10_1016_j_heliyon_2024_e39016
crossref_primary_10_1016_j_ins_2022_08_100
crossref_primary_10_3390_ijgi14070272
crossref_primary_10_1007_s00357_025_09506_5
crossref_primary_10_1080_00051144_2023_2217601
crossref_primary_10_3390_e27070701
crossref_primary_10_1007_s40031_025_01241_0
crossref_primary_10_1111_jfpe_14519
crossref_primary_10_20473_jatm_v3i1_55572
crossref_primary_10_1080_01969722_2022_2159150
crossref_primary_10_1007_s12032_024_02419_0
crossref_primary_10_1080_01969722_2022_2103231
crossref_primary_10_1016_j_jksuci_2022_12_006
crossref_primary_10_1080_01969722_2022_2151165
crossref_primary_10_1093_ijfood_vvaf017
crossref_primary_10_1016_j_scico_2025_103303
crossref_primary_10_1109_ACCESS_2025_3589620
crossref_primary_10_3390_app151810251
crossref_primary_10_1007_s10639_023_11899_y
crossref_primary_10_1016_j_asej_2025_103378
crossref_primary_10_1007_s12633_024_03148_9
crossref_primary_10_1016_j_jobe_2025_113517
crossref_primary_10_1051_itmconf_20257504002
crossref_primary_10_1016_j_enconman_2025_120002
crossref_primary_10_1007_s44196_025_00866_9
crossref_primary_10_1109_ACCESS_2024_3351365
crossref_primary_10_1007_s13177_024_00417_0
crossref_primary_10_1016_j_dajour_2025_100591
crossref_primary_10_1109_ACCESS_2023_3311203
crossref_primary_10_1109_TIM_2025_3578708
crossref_primary_10_2478_jaiscr_2025_0013
crossref_primary_10_1016_j_patcog_2022_109255
crossref_primary_10_1007_s41060_024_00524_x
crossref_primary_10_1007_s11634_024_00588_4
crossref_primary_10_1016_j_neucom_2025_129700
crossref_primary_10_1080_0951192X_2023_2177748
crossref_primary_10_1080_01969722_2022_2129375
crossref_primary_10_1177_30504554251325141
crossref_primary_10_1080_01969722_2022_2110682
crossref_primary_10_1016_j_measurement_2025_117199
crossref_primary_10_1007_s11042_024_18215_x
ContentType Journal Article
DBID DOA
DOI 10.1016/j.jksuci.2022.04.010
DatabaseName DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 3842
ExternalDocumentID oai_doaj_org_article_35aaf34f0bd248bbb60a2a8667f37387
GroupedDBID --K
0R~
4.4
457
5VS
AAEDT
AAEDW
AAIKJ
AAJSJ
AALRI
AASML
AAXUO
AAYWO
ABMAC
ACGFS
ADBBV
ADEZE
ADVLN
AEXQZ
AFGXO
AFJKZ
AFTJW
AGHFR
AITUG
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
APXCP
BCNDV
C6C
EBS
FDB
GROUPED_DOAJ
IXB
KQ8
M41
O-L
O9-
OK1
ROL
SES
SSZ
XH2
ID FETCH-LOGICAL-c418t-8116bc39563a85f697ebf2c1a0e98a9e90d9d23bd092e315bf4af7964700a1463
IEDL.DBID DOA
ISICitedReferencesCount 69
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000836430400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1319-1578
IngestDate Mon Nov 03 22:06:49 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c418t-8116bc39563a85f697ebf2c1a0e98a9e90d9d23bd092e315bf4af7964700a1463
OpenAccessLink https://doaj.org/article/35aaf34f0bd248bbb60a2a8667f37387
PageCount 15
ParticipantIDs doaj_primary_oai_doaj_org_article_35aaf34f0bd248bbb60a2a8667f37387
PublicationCentury 2000
PublicationDate 2022-06-01
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-06-01
  day: 01
PublicationDecade 2020
PublicationTitle Journal of King Saud University. Computer and information sciences
PublicationYear 2022
Publisher Springer
Publisher_xml – name: Springer
SSID ssj0001765436
Score 2.4698825
Snippet The advent of architectures such as the Internet of Things (IoT) has led to the dramatic growth of data and the production of big data. Managing this...
SourceID doaj
SourceType Open Website
StartPage 3828
SubjectTerms Clusters clustering
Ensemble clustering
Hierarchical clustering
Meta-clusters
Model selection
Similarity measurement
Title An ensemble agglomerative hierarchical clustering algorithm based on clusters clustering technique and the novel similarity measurement
URI https://doaj.org/article/35aaf34f0bd248bbb60a2a8667f37387
Volume 34
WOSCitedRecordID wos000836430400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT4QwEG6M8eDFt_GdHrwSS4HSHlfjxtPGgyZ7Iy2drigPs7D7F_zbtoVVbl68AiFkZmAezPd9CN1ykZMERBSkRtMgZkkeSBAioABMS5sPGOvFJtLZjM_n4nkk9eV2wnp64N5wd1EipYliQ5SmMVdKMSKp5IylxpHyeBw5ScWomfLTldRhJj20yKF0QhuXG9ycX-56_2hXeWHbQ0o906kD0I44-31ymR6gvaEqxJP-aQ7RFtRHaH-juICHF_AYfU1qbPtOqFQJWC4WZeNmSu6LhZ2mtf8rYI2O83LlCBBsWsKyXDTLonursEtYGjf15mw7vuyHyxXLWmNbFeK6WUOJ26IqbPNra3Vc_Y4TT9Dr9PHl4SkYpBSCPA55F_AwZCqPbDMUSZ4YJlJQhuahJCC4FCCIFppGShNBIQoTZWJpPEqVEGkdFp2i7bqp4QzhBBLgjjNHQRynmtogIIrYOidUilCdnKN7Z8jss2fLyBx_tT9gvZoNXs3-8urFf9zkEu06B_eLXVdou1uu4Brt5OuuaJc3PmC-ASn8yDs
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+ensemble+agglomerative+hierarchical+clustering+algorithm+based+on+clusters+clustering+technique+and+the+novel+similarity+measurement&rft.jtitle=Journal+of+King+Saud+University.+Computer+and+information+sciences&rft.au=Teng+Li&rft.au=Amin+Rezaeipanah&rft.au=ElSayed+M.+Tag+El+Din&rft.date=2022-06-01&rft.pub=Springer&rft.issn=1319-1578&rft.volume=34&rft.issue=6&rft.spage=3828&rft.epage=3842&rft_id=info:doi/10.1016%2Fj.jksuci.2022.04.010&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_35aaf34f0bd248bbb60a2a8667f37387
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1319-1578&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1319-1578&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1319-1578&client=summon