An Adaptive Clustering Algorithm Based on Local-Density Peaks for Imbalanced Data Without Parameters

Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First,...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on knowledge and data engineering Ročník 35; číslo 4; s. 3419 - 3432
Hlavní autori: Tong, Wuning, Wang, Yuping, Liu, Delong
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 01.04.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:1041-4347, 1558-2191
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First, an initial sub-cluster construction scheme is designed based on a 3-dimensional (3-D) decision graph that can easily detect the initial sub-cluster centers and identify the noise points. Second, a sub-cluster updating strategy is designed, which can automatically identify the false sub-cluster centers and update the initial sub-clusters. Third, a sub-cluster merging scheme is designed, which merges the updated initial sub-clusters into final clusters. Consequently, the proposed algorithm has three advantages: 1) It does not require any input parameters; 2) It can automatically determine the cluster centers and number of clusters; 3) It is suitable for imbalanced datasets and datasets with arbitrary shapes and distributions. The effectiveness of LDPI is demonstrated experimentally and the superiority of LDPI is identified by comparison with 5 state-of-the-art algorithms.
AbstractList Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First, an initial sub-cluster construction scheme is designed based on a 3-dimensional (3-D) decision graph that can easily detect the initial sub-cluster centers and identify the noise points. Second, a sub-cluster updating strategy is designed, which can automatically identify the false sub-cluster centers and update the initial sub-clusters. Third, a sub-cluster merging scheme is designed, which merges the updated initial sub-clusters into final clusters. Consequently, the proposed algorithm has three advantages: 1) It does not require any input parameters; 2) It can automatically determine the cluster centers and number of clusters; 3) It is suitable for imbalanced datasets and datasets with arbitrary shapes and distributions. The effectiveness of LDPI is demonstrated experimentally and the superiority of LDPI is identified by comparison with 5 state-of-the-art algorithms.
Author Wang, Yuping
Liu, Delong
Tong, Wuning
Author_xml – sequence: 1
  givenname: Wuning
  surname: Tong
  fullname: Tong, Wuning
  email: tongwuning@sntcm.edu.cn
  organization: School of Computer Science and Technology, Xidian University, Xian, Shaanxi, China
– sequence: 2
  givenname: Yuping
  orcidid: 0000-0001-6868-0004
  surname: Wang
  fullname: Wang, Yuping
  email: ywang@xidian.edu.cn
  organization: School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
– sequence: 3
  givenname: Delong
  surname: Liu
  fullname: Liu, Delong
  email: dlliu_1@stu.xidian.edu.cn
  organization: School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
BookMark eNp9kEFPGzEQhS0EEhD4AYiLpZ43tdfetfeYJkBRI5UDiONq1jubmm7s1HaQ8u9xlIhDD5xmDu97b-ZdklPnHRJyw9mUc9Z8f_61uJuWrORTwYVu6vKEXPCq0kXJG36adyZ5IYVU5-QyxjfGmFaaX5B-5uish02y70jn4zYmDNat6Gxc-WDTnzX9ARF76h1degNjsUAXbdrRJ4S_kQ4-0Md1ByM4k1ULSEBfM-a3iT5BgDVmv3hFzgYYI14f54S83N89z38Wy98Pj_PZsjCllKyoRTeYXpu-7rBB2SgpFEhdcYUIZS-0kpUSney1Uj0bUDJhOlGBMPtnsBQT8u3guwn-3xZjat_8Nrgc2ZZKy0o2da2zih9UJvgYAw7tJtg1hF3LWbsvs92X2e7LbI9lZkb9xxibIFnvUgA7fkneHkiLiJ9J-RApsuQD5fGDTg
CODEN ITKEEH
CitedBy_id crossref_primary_10_1016_j_energy_2024_130770
crossref_primary_10_1109_TKDE_2024_3392953
crossref_primary_10_1016_j_neucom_2025_131482
crossref_primary_10_1109_TIFS_2025_3607231
crossref_primary_10_1109_TKDE_2023_3266648
crossref_primary_10_1109_TKDE_2023_3312760
crossref_primary_10_15622_ia_24_2_1
crossref_primary_10_1016_j_patcog_2025_111878
crossref_primary_10_1080_10447318_2024_2387421
crossref_primary_10_1109_TNNLS_2025_3547362
crossref_primary_10_1038_s41598_025_16319_4
crossref_primary_10_3233_ICA_220682
crossref_primary_10_1016_j_asoc_2025_113901
crossref_primary_10_1109_ACCESS_2024_3404917
crossref_primary_10_1007_s11634_024_00611_8
crossref_primary_10_1016_j_ins_2024_120685
crossref_primary_10_1016_j_knosys_2025_114097
crossref_primary_10_1109_TNNLS_2025_3563769
crossref_primary_10_1007_s10586_025_05225_z
Cites_doi 10.1016/S0305-0548(01)00043-0
10.1016/j.patrec.2016.05.007
10.1016/j.is.2006.10.006
10.1016/j.neucom.2016.01.102
10.1007/s00779-016-0954-4
10.1109/TKDE.2005.184
10.1109/TIT.1967.1053964
10.1109/IJCNN.2003.1223306
10.1016/j.ins.2018.03.031
10.1109/ICPR.2010.1053
10.1109/TKDE.2012.232
10.1109/IIKI.2015.62
10.1145/235968.233324
10.1109/TKDE.2017.2787640
10.1016/j.neucom.2020.03.125
10.1007/s11280-012-0178-0
10.1049/cje.2016.05.001
10.1109/MC.2004.1297301
10.1111/j.0824-7935.2004.t01-1-00228.x
10.1126/science.1242072
10.1109/TBME.2017.2655364
10.1016/j.knosys.2019.06.032
10.4156/jcit.vol6.issue1.8
10.1016/j.knosys.2016.02.001
10.1186/s40537-019-0192-5
10.1109/ICDM.2006.9
10.1109/TFUZZ.2011.2182354
10.1109/LARS.2009.5418323
10.1016/j.ipm.2020.102388
10.1007/s13042-013-0177-1
10.1109/TKDE.2005.201
10.1177/1460458218796636
10.1109/ICARCV.2014.7064454
10.1007/s10489-018-1238-7
10.1007/BF00114265
10.1109/TCYB.2019.2916196
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TKDE.2021.3138962
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2191
EndPage 3432
ExternalDocumentID 10_1109_TKDE_2021_3138962
9664331
Genre orig-research
GrantInformation_xml – fundername: National Key Research and Development Program of China
  grantid: 2017YFC1703506
– fundername: National Natural Science Foundation of China
  grantid: 61872281
  funderid: 10.13039/501100001809
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c2440-63bfcd8cd6be9e497437a48517eea2d3874573b4d877d0fe403cb35a3c0878e23
IEDL.DBID RIE
ISSN 1041-4347
IngestDate Mon Jun 30 02:41:34 EDT 2025
Sat Nov 29 02:36:04 EST 2025
Tue Nov 18 21:15:23 EST 2025
Wed Aug 27 02:18:06 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2440-63bfcd8cd6be9e497437a48517eea2d3874573b4d877d0fe403cb35a3c0878e23
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6868-0004
PQID 2784549668
PQPubID 85438
PageCount 14
ParticipantIDs crossref_primary_10_1109_TKDE_2021_3138962
ieee_primary_9664331
crossref_citationtrail_10_1109_TKDE_2021_3138962
proquest_journals_2784549668
PublicationCentury 2000
PublicationDate 2023-04-01
PublicationDateYYYYMMDD 2023-04-01
PublicationDate_xml – month: 04
  year: 2023
  text: 2023-04-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref35
ref12
ref34
ref15
ref37
ref14
ref36
ref31
ref30
ref11
ref33
ref32
ref2
ref1
ref17
ref38
ref19
ref18
ref24
Lu (ref4) 2011; 6
Wang (ref10)
ref23
ref26
ref25
ref20
Dua (ref39) 2017
ref42
Franti (ref41) 2018
ref22
ref21
MacQueen (ref7)
Duin (ref40) 2017
ref28
ref27
ref29
ref8
Kotsiantis (ref16) 2006; 30
ref3
ref6
ref5
Ester (ref9) 1996
References_xml – ident: ref3
  doi: 10.1016/S0305-0548(01)00043-0
– ident: ref38
  doi: 10.1016/j.patrec.2016.05.007
– ident: ref22
  doi: 10.1016/j.is.2006.10.006
– ident: ref20
  doi: 10.1016/j.neucom.2016.01.102
– start-page: 226
  year: 1996
  ident: ref9
  article-title: A density-based algorithm for discovering clusters in large spatial databases with noise
– ident: ref25
  doi: 10.1007/s00779-016-0954-4
– ident: ref30
  doi: 10.1109/TKDE.2005.184
– ident: ref32
  doi: 10.1109/TIT.1967.1053964
– ident: ref5
  doi: 10.1109/IJCNN.2003.1223306
– ident: ref23
  doi: 10.1016/j.ins.2018.03.031
– ident: ref6
  doi: 10.1109/ICPR.2010.1053
– volume: 30
  start-page: 25
  issue: 1
  year: 2006
  ident: ref16
  article-title: Handling imbalanced datasets: A review
  publication-title: GESTS Int. Trans. Comput. Sci. Eng.
– ident: ref17
  doi: 10.1109/TKDE.2012.232
– ident: ref24
  doi: 10.1109/IIKI.2015.62
– ident: ref8
  doi: 10.1145/235968.233324
– ident: ref26
  doi: 10.1109/TKDE.2017.2787640
– ident: ref37
  doi: 10.1016/j.neucom.2020.03.125
– ident: ref14
  doi: 10.1007/s11280-012-0178-0
– ident: ref21
  doi: 10.1049/cje.2016.05.001
– ident: ref13
  doi: 10.1109/MC.2004.1297301
– ident: ref18
  doi: 10.1111/j.0824-7935.2004.t01-1-00228.x
– ident: ref19
  doi: 10.1126/science.1242072
– ident: ref1
  doi: 10.1109/TBME.2017.2655364
– ident: ref35
  doi: 10.1016/j.knosys.2019.06.032
– volume: 6
  start-page: 62
  issue: 1
  year: 2011
  ident: ref4
  article-title: Research on credit card fraud detection model based on class weighted support vector machine
  publication-title: J. Convergence Inf. Technol.
  doi: 10.4156/jcit.vol6.issue1.8
– ident: ref34
  doi: 10.1016/j.knosys.2016.02.001
– ident: ref15
  doi: 10.1186/s40537-019-0192-5
– start-page: 281
  volume-title: Proc. 15th Berkeley Symp. Math. Statist. Probability
  ident: ref7
  article-title: Some methods for classification and analysis of multivariate observations
– year: 2017
  ident: ref39
  article-title: UCI machine learning repository
– year: 2017
  ident: ref40
  article-title: PR-Tools4.1, a matlab toolbox for pattern recognition
– ident: ref36
  doi: 10.1109/ICDM.2006.9
– ident: ref28
  doi: 10.1109/TFUZZ.2011.2182354
– ident: ref42
  doi: 10.1109/LARS.2009.5418323
– ident: ref27
  doi: 10.1016/j.ipm.2020.102388
– ident: ref33
  doi: 10.1007/s13042-013-0177-1
– ident: ref2
  doi: 10.1109/TKDE.2005.201
– ident: ref12
  doi: 10.1177/1460458218796636
– ident: ref29
  doi: 10.1109/ICARCV.2014.7064454
– year: 2018
  ident: ref41
  article-title: K-means properties on six clustering benchmark datasets
  doi: 10.1007/s10489-018-1238-7
– ident: ref11
  doi: 10.1007/BF00114265
– start-page: 186
  volume-title: Proc. 23rd Int. Conf. Very Large Data Bases
  ident: ref10
  article-title: Sting: A statistical information grid approach to spatial data mining
– ident: ref31
  doi: 10.1109/TCYB.2019.2916196
SSID ssj0008781
Score 2.5231884
Snippet Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 3419
SubjectTerms Adaptive algorithms
Algorithms
Clustering
Clustering algorithms
Clustering methods
Computer science
Data clustering
Datasets
Density distribution
density peaks
imbalanced data
Machine learning
Machine learning algorithms
multiple centers
Parameters
Shape
Task analysis
Title An Adaptive Clustering Algorithm Based on Local-Density Peaks for Imbalanced Data Without Parameters
URI https://ieeexplore.ieee.org/document/9664331
https://www.proquest.com/docview/2784549668
Volume 35
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2191
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008781
  issn: 1041-4347
  databaseCode: RIE
  dateStart: 19890101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB60eNCD9Yn1xR48idEku91NjtUqiiIefN3CPrVYE2lTwX_vTpoWRRG8BbILYb_dmfkys98A7OmoLUPDoiCOEx4w1TaBSigP0lQxJagzoqrNub8S19fJ42N6MwMH07sw1tqq-Mwe4mOVyzeFHuGvsiMfmuMFn1mYFYKP72pNrW4iqoaknl14TkSZqDOYUZge3V52Tz0TjCNPUL1_5vE3H1Q1VflhiSv3ctb834ctwWIdRpLOGPdlmLH5CjQnLRpIfWJXYOGL3uAqmE5OOka-oYkjJ_0RiiT4F6TTfyoGvfL5lRx7p2ZIkZMr9HFBF8vbyw_iDefLkPj4lly8KiyG1H5UV5aSPPhpxagkNxKrvFCqcw3uzk5vT86Dus1CoGPM7HKqHCoEGK5sapknGFRI5iMxYa2MDUVBfEEVM4kQJnSWhVQr2pZU46LbmK5DIy9yuwFEGueYbYdGaM6cEipx0lllPK90ike8BeFk4TNda5BjK4x-VnGRMM0QqwyxymqsWrA_nfI2FuD4a_AqgjMdWOPSgu0Jull9RIcZZlw9OeY82fx91hbMY2_5cZnONjTKwcjuwJx-L3vDwW61-z4BGQPXNA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwEB7xWAn2wBtteez6wAltliR27ORYKAhEqTiUXW6Rn4AoCWrTlfbf40nTihUIiVuk2FLkz56ZLzP-BuBAR4kMDYuCOE55wFRiApVSHmSZYkpQZ0Rdm_O7K3q99PY2u56Dn7O7MNbauvjM_sLHOpdvSj3GX2VHPjTHCz7zsJgwFoeT21ozu5uKuiWp5xeeFVEmmhxmFGZH_cvOqeeCceQpqvfQPP7PC9VtVd7Y4trBnK1-7tPWYKUJJEl7gvw6zNliA1anTRpIc2Y34OsrxcFNMO2CtI18RiNHTgZjlEnwL0h7cFcOH6r7J3Ls3ZohZUG66OWCDha4V_-IN52PI-IjXHLxpLAcUvtRHVlJ8sdPK8cVuZZY54VinVtwc3baPzkPmkYLgY4xt8upcqgRYLiymWWeYlAhmY_FhLUyNhQl8QVVzKRCmNBZFlKtaCKpxkW3Md2GhaIs7Dcg0jjHbBIaoTlzSqjUSWeV8czSKR7xFoTThc91o0KOzTAGec1GwixHrHLEKm-wasHhbMrzRILjo8GbCM5sYINLC_am6ObNIR3lmHP19JjzdOf9WT9g6bx_1c27F73LXVjGTvOTop09WKiGY7sPX_Tf6mE0_F7vxBdxgNp7
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Adaptive+Clustering+Algorithm+Based+on+Local-Density+Peaks+for+Imbalanced+Data+Without+Parameters&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Tong%2C+Wuning&rft.au=Wang%2C+Yuping&rft.au=Liu%2C+Delong&rft.date=2023-04-01&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=35&rft.issue=4&rft.spage=3419&rft.epage=3432&rft_id=info:doi/10.1109%2FTKDE.2021.3138962&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TKDE_2021_3138962
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon