An Adaptive Clustering Algorithm Based on Local-Density Peaks for Imbalanced Data Without Parameters
Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First,...
Uložené v:
| Vydané v: | IEEE transactions on knowledge and data engineering Ročník 35; číslo 4; s. 3419 - 3432 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
IEEE
01.04.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 1041-4347, 1558-2191 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First, an initial sub-cluster construction scheme is designed based on a 3-dimensional (3-D) decision graph that can easily detect the initial sub-cluster centers and identify the noise points. Second, a sub-cluster updating strategy is designed, which can automatically identify the false sub-cluster centers and update the initial sub-clusters. Third, a sub-cluster merging scheme is designed, which merges the updated initial sub-clusters into final clusters. Consequently, the proposed algorithm has three advantages: 1) It does not require any input parameters; 2) It can automatically determine the cluster centers and number of clusters; 3) It is suitable for imbalanced datasets and datasets with arbitrary shapes and distributions. The effectiveness of LDPI is demonstrated experimentally and the superiority of LDPI is identified by comparison with 5 state-of-the-art algorithms. |
|---|---|
| AbstractList | Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First, an initial sub-cluster construction scheme is designed based on a 3-dimensional (3-D) decision graph that can easily detect the initial sub-cluster centers and identify the noise points. Second, a sub-cluster updating strategy is designed, which can automatically identify the false sub-cluster centers and update the initial sub-clusters. Third, a sub-cluster merging scheme is designed, which merges the updated initial sub-clusters into final clusters. Consequently, the proposed algorithm has three advantages: 1) It does not require any input parameters; 2) It can automatically determine the cluster centers and number of clusters; 3) It is suitable for imbalanced datasets and datasets with arbitrary shapes and distributions. The effectiveness of LDPI is demonstrated experimentally and the superiority of LDPI is identified by comparison with 5 state-of-the-art algorithms. |
| Author | Wang, Yuping Liu, Delong Tong, Wuning |
| Author_xml | – sequence: 1 givenname: Wuning surname: Tong fullname: Tong, Wuning email: tongwuning@sntcm.edu.cn organization: School of Computer Science and Technology, Xidian University, Xian, Shaanxi, China – sequence: 2 givenname: Yuping orcidid: 0000-0001-6868-0004 surname: Wang fullname: Wang, Yuping email: ywang@xidian.edu.cn organization: School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China – sequence: 3 givenname: Delong surname: Liu fullname: Liu, Delong email: dlliu_1@stu.xidian.edu.cn organization: School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China |
| BookMark | eNp9kEFPGzEQhS0EEhD4AYiLpZ43tdfetfeYJkBRI5UDiONq1jubmm7s1HaQ8u9xlIhDD5xmDu97b-ZdklPnHRJyw9mUc9Z8f_61uJuWrORTwYVu6vKEXPCq0kXJG36adyZ5IYVU5-QyxjfGmFaaX5B-5uish02y70jn4zYmDNat6Gxc-WDTnzX9ARF76h1degNjsUAXbdrRJ4S_kQ4-0Md1ByM4k1ULSEBfM-a3iT5BgDVmv3hFzgYYI14f54S83N89z38Wy98Pj_PZsjCllKyoRTeYXpu-7rBB2SgpFEhdcYUIZS-0kpUSney1Uj0bUDJhOlGBMPtnsBQT8u3guwn-3xZjat_8Nrgc2ZZKy0o2da2zih9UJvgYAw7tJtg1hF3LWbsvs92X2e7LbI9lZkb9xxibIFnvUgA7fkneHkiLiJ9J-RApsuQD5fGDTg |
| CODEN | ITKEEH |
| CitedBy_id | crossref_primary_10_1016_j_energy_2024_130770 crossref_primary_10_1109_TKDE_2024_3392953 crossref_primary_10_1016_j_neucom_2025_131482 crossref_primary_10_1109_TIFS_2025_3607231 crossref_primary_10_1109_TKDE_2023_3266648 crossref_primary_10_1109_TKDE_2023_3312760 crossref_primary_10_15622_ia_24_2_1 crossref_primary_10_1016_j_patcog_2025_111878 crossref_primary_10_1080_10447318_2024_2387421 crossref_primary_10_1109_TNNLS_2025_3547362 crossref_primary_10_1038_s41598_025_16319_4 crossref_primary_10_3233_ICA_220682 crossref_primary_10_1016_j_asoc_2025_113901 crossref_primary_10_1109_ACCESS_2024_3404917 crossref_primary_10_1007_s11634_024_00611_8 crossref_primary_10_1016_j_ins_2024_120685 crossref_primary_10_1016_j_knosys_2025_114097 crossref_primary_10_1109_TNNLS_2025_3563769 crossref_primary_10_1007_s10586_025_05225_z |
| Cites_doi | 10.1016/S0305-0548(01)00043-0 10.1016/j.patrec.2016.05.007 10.1016/j.is.2006.10.006 10.1016/j.neucom.2016.01.102 10.1007/s00779-016-0954-4 10.1109/TKDE.2005.184 10.1109/TIT.1967.1053964 10.1109/IJCNN.2003.1223306 10.1016/j.ins.2018.03.031 10.1109/ICPR.2010.1053 10.1109/TKDE.2012.232 10.1109/IIKI.2015.62 10.1145/235968.233324 10.1109/TKDE.2017.2787640 10.1016/j.neucom.2020.03.125 10.1007/s11280-012-0178-0 10.1049/cje.2016.05.001 10.1109/MC.2004.1297301 10.1111/j.0824-7935.2004.t01-1-00228.x 10.1126/science.1242072 10.1109/TBME.2017.2655364 10.1016/j.knosys.2019.06.032 10.4156/jcit.vol6.issue1.8 10.1016/j.knosys.2016.02.001 10.1186/s40537-019-0192-5 10.1109/ICDM.2006.9 10.1109/TFUZZ.2011.2182354 10.1109/LARS.2009.5418323 10.1016/j.ipm.2020.102388 10.1007/s13042-013-0177-1 10.1109/TKDE.2005.201 10.1177/1460458218796636 10.1109/ICARCV.2014.7064454 10.1007/s10489-018-1238-7 10.1007/BF00114265 10.1109/TCYB.2019.2916196 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TKDE.2021.3138962 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2191 |
| EndPage | 3432 |
| ExternalDocumentID | 10_1109_TKDE_2021_3138962 9664331 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Key Research and Development Program of China grantid: 2017YFC1703506 – fundername: National Natural Science Foundation of China grantid: 61872281 funderid: 10.13039/501100001809 |
| GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c2440-63bfcd8cd6be9e497437a48517eea2d3874573b4d877d0fe403cb35a3c0878e23 |
| IEDL.DBID | RIE |
| ISSN | 1041-4347 |
| IngestDate | Mon Jun 30 02:41:34 EDT 2025 Sat Nov 29 02:36:04 EST 2025 Tue Nov 18 21:15:23 EST 2025 Wed Aug 27 02:18:06 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2440-63bfcd8cd6be9e497437a48517eea2d3874573b4d877d0fe403cb35a3c0878e23 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-6868-0004 |
| PQID | 2784549668 |
| PQPubID | 85438 |
| PageCount | 14 |
| ParticipantIDs | crossref_primary_10_1109_TKDE_2021_3138962 ieee_primary_9664331 crossref_citationtrail_10_1109_TKDE_2021_3138962 proquest_journals_2784549668 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-04-01 |
| PublicationDateYYYYMMDD | 2023-04-01 |
| PublicationDate_xml | – month: 04 year: 2023 text: 2023-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on knowledge and data engineering |
| PublicationTitleAbbrev | TKDE |
| PublicationYear | 2023 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref35 ref12 ref34 ref15 ref37 ref14 ref36 ref31 ref30 ref11 ref33 ref32 ref2 ref1 ref17 ref38 ref19 ref18 ref24 Lu (ref4) 2011; 6 Wang (ref10) ref23 ref26 ref25 ref20 Dua (ref39) 2017 ref42 Franti (ref41) 2018 ref22 ref21 MacQueen (ref7) Duin (ref40) 2017 ref28 ref27 ref29 ref8 Kotsiantis (ref16) 2006; 30 ref3 ref6 ref5 Ester (ref9) 1996 |
| References_xml | – ident: ref3 doi: 10.1016/S0305-0548(01)00043-0 – ident: ref38 doi: 10.1016/j.patrec.2016.05.007 – ident: ref22 doi: 10.1016/j.is.2006.10.006 – ident: ref20 doi: 10.1016/j.neucom.2016.01.102 – start-page: 226 year: 1996 ident: ref9 article-title: A density-based algorithm for discovering clusters in large spatial databases with noise – ident: ref25 doi: 10.1007/s00779-016-0954-4 – ident: ref30 doi: 10.1109/TKDE.2005.184 – ident: ref32 doi: 10.1109/TIT.1967.1053964 – ident: ref5 doi: 10.1109/IJCNN.2003.1223306 – ident: ref23 doi: 10.1016/j.ins.2018.03.031 – ident: ref6 doi: 10.1109/ICPR.2010.1053 – volume: 30 start-page: 25 issue: 1 year: 2006 ident: ref16 article-title: Handling imbalanced datasets: A review publication-title: GESTS Int. Trans. Comput. Sci. Eng. – ident: ref17 doi: 10.1109/TKDE.2012.232 – ident: ref24 doi: 10.1109/IIKI.2015.62 – ident: ref8 doi: 10.1145/235968.233324 – ident: ref26 doi: 10.1109/TKDE.2017.2787640 – ident: ref37 doi: 10.1016/j.neucom.2020.03.125 – ident: ref14 doi: 10.1007/s11280-012-0178-0 – ident: ref21 doi: 10.1049/cje.2016.05.001 – ident: ref13 doi: 10.1109/MC.2004.1297301 – ident: ref18 doi: 10.1111/j.0824-7935.2004.t01-1-00228.x – ident: ref19 doi: 10.1126/science.1242072 – ident: ref1 doi: 10.1109/TBME.2017.2655364 – ident: ref35 doi: 10.1016/j.knosys.2019.06.032 – volume: 6 start-page: 62 issue: 1 year: 2011 ident: ref4 article-title: Research on credit card fraud detection model based on class weighted support vector machine publication-title: J. Convergence Inf. Technol. doi: 10.4156/jcit.vol6.issue1.8 – ident: ref34 doi: 10.1016/j.knosys.2016.02.001 – ident: ref15 doi: 10.1186/s40537-019-0192-5 – start-page: 281 volume-title: Proc. 15th Berkeley Symp. Math. Statist. Probability ident: ref7 article-title: Some methods for classification and analysis of multivariate observations – year: 2017 ident: ref39 article-title: UCI machine learning repository – year: 2017 ident: ref40 article-title: PR-Tools4.1, a matlab toolbox for pattern recognition – ident: ref36 doi: 10.1109/ICDM.2006.9 – ident: ref28 doi: 10.1109/TFUZZ.2011.2182354 – ident: ref42 doi: 10.1109/LARS.2009.5418323 – ident: ref27 doi: 10.1016/j.ipm.2020.102388 – ident: ref33 doi: 10.1007/s13042-013-0177-1 – ident: ref2 doi: 10.1109/TKDE.2005.201 – ident: ref12 doi: 10.1177/1460458218796636 – ident: ref29 doi: 10.1109/ICARCV.2014.7064454 – year: 2018 ident: ref41 article-title: K-means properties on six clustering benchmark datasets doi: 10.1007/s10489-018-1238-7 – ident: ref11 doi: 10.1007/BF00114265 – start-page: 186 volume-title: Proc. 23rd Int. Conf. Very Large Data Bases ident: ref10 article-title: Sting: A statistical information grid approach to spatial data mining – ident: ref31 doi: 10.1109/TCYB.2019.2916196 |
| SSID | ssj0008781 |
| Score | 2.5231884 |
| Snippet | Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 3419 |
| SubjectTerms | Adaptive algorithms Algorithms Clustering Clustering algorithms Clustering methods Computer science Data clustering Datasets Density distribution density peaks imbalanced data Machine learning Machine learning algorithms multiple centers Parameters Shape Task analysis |
| Title | An Adaptive Clustering Algorithm Based on Local-Density Peaks for Imbalanced Data Without Parameters |
| URI | https://ieeexplore.ieee.org/document/9664331 https://www.proquest.com/docview/2784549668 |
| Volume | 35 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library (IEL) customDbUrl: eissn: 1558-2191 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008781 issn: 1041-4347 databaseCode: RIE dateStart: 19890101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB60eNCD9Yn1xR48idEku91NjtUqiiIefN3CPrVYE2lTwX_vTpoWRRG8BbILYb_dmfkys98A7OmoLUPDoiCOEx4w1TaBSigP0lQxJagzoqrNub8S19fJ42N6MwMH07sw1tqq-Mwe4mOVyzeFHuGvsiMfmuMFn1mYFYKP72pNrW4iqoaknl14TkSZqDOYUZge3V52Tz0TjCNPUL1_5vE3H1Q1VflhiSv3ctb834ctwWIdRpLOGPdlmLH5CjQnLRpIfWJXYOGL3uAqmE5OOka-oYkjJ_0RiiT4F6TTfyoGvfL5lRx7p2ZIkZMr9HFBF8vbyw_iDefLkPj4lly8KiyG1H5UV5aSPPhpxagkNxKrvFCqcw3uzk5vT86Dus1CoGPM7HKqHCoEGK5sapknGFRI5iMxYa2MDUVBfEEVM4kQJnSWhVQr2pZU46LbmK5DIy9yuwFEGueYbYdGaM6cEipx0lllPK90ike8BeFk4TNda5BjK4x-VnGRMM0QqwyxymqsWrA_nfI2FuD4a_AqgjMdWOPSgu0Jull9RIcZZlw9OeY82fx91hbMY2_5cZnONjTKwcjuwJx-L3vDwW61-z4BGQPXNA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwEB7xWAn2wBtteez6wAltliR27ORYKAhEqTiUXW6Rn4AoCWrTlfbf40nTihUIiVuk2FLkz56ZLzP-BuBAR4kMDYuCOE55wFRiApVSHmSZYkpQZ0Rdm_O7K3q99PY2u56Dn7O7MNbauvjM_sLHOpdvSj3GX2VHPjTHCz7zsJgwFoeT21ozu5uKuiWp5xeeFVEmmhxmFGZH_cvOqeeCceQpqvfQPP7PC9VtVd7Y4trBnK1-7tPWYKUJJEl7gvw6zNliA1anTRpIc2Y34OsrxcFNMO2CtI18RiNHTgZjlEnwL0h7cFcOH6r7J3Ls3ZohZUG66OWCDha4V_-IN52PI-IjXHLxpLAcUvtRHVlJ8sdPK8cVuZZY54VinVtwc3baPzkPmkYLgY4xt8upcqgRYLiymWWeYlAhmY_FhLUyNhQl8QVVzKRCmNBZFlKtaCKpxkW3Md2GhaIs7Dcg0jjHbBIaoTlzSqjUSWeV8czSKR7xFoTThc91o0KOzTAGec1GwixHrHLEKm-wasHhbMrzRILjo8GbCM5sYINLC_am6ObNIR3lmHP19JjzdOf9WT9g6bx_1c27F73LXVjGTvOTop09WKiGY7sPX_Tf6mE0_F7vxBdxgNp7 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Adaptive+Clustering+Algorithm+Based+on+Local-Density+Peaks+for+Imbalanced+Data+Without+Parameters&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Tong%2C+Wuning&rft.au=Wang%2C+Yuping&rft.au=Liu%2C+Delong&rft.date=2023-04-01&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=35&rft.issue=4&rft.spage=3419&rft.epage=3432&rft_id=info:doi/10.1109%2FTKDE.2021.3138962&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TKDE_2021_3138962 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |