MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced data

Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble b...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:AIMS mathematics Ročník 9; číslo 7; s. 17504 - 17530
Hlavní autoři: Wang, Shuxiang, Shao, Changbin, Xu, Sen, Yang, Xibei, Yu, Hualong
Médium: Journal Article
Jazyk:angličtina
Vydáno: AIMS Press 01.01.2024
Témata:
ISSN:2473-6988, 2473-6988
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble by using a meta-learner to combine the predictions of multiple base classifiers, has been used for solving class imbalance learning issues. Specifically, in the context of class imbalance learning, a stacking ensemble learning algorithm is generally considered to combine with a specific sampling algorithm. Such an operation, however, might suffer from suboptimization problems as only using a sampling strategy may make it difficult to acquire diverse enough features. In addition, we also note that using all of these features may damage the meta-learner as there may exist noisy and redundant features. To address these problems, we have proposed a novel stacking ensemble learning algorithm named MSFSS, which divides the learning procedure into two phases. The first stage combined multiple sampling algorithms and multiple supervised learning approaches to construct meta feature space by means of cross combination. The adoption of this strategy satisfied the diversity of the stacking ensemble. The second phase adopted the whale optimization algorithm (WOA) to select the optimal sub-feature combination from the meta feature space, which further improved the quality of the features. Finally, a linear regression classifier was trained as the meta learner to conduct the final prediction. Experimental results on 40 benchmarked imbalanced datasets showed that the proposed MSFSS algorithm significantly outperformed several popular and state-of-the-art class imbalance ensemble learning algorithms. Specifically, the MSFSS acquired the best results in terms of the F-measure metric on 27 datasets and the best results in terms of the G-mean metric on 26 datasets, out of 40 datasets. Although it required consuming more time than several other competitors, the increment of the running time was acceptable. The experimental results indicated the effectiveness and superiority of the proposed MSFSS algorithm.
AbstractList Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble by using a meta-learner to combine the predictions of multiple base classifiers, has been used for solving class imbalance learning issues. Specifically, in the context of class imbalance learning, a stacking ensemble learning algorithm is generally considered to combine with a specific sampling algorithm. Such an operation, however, might suffer from suboptimization problems as only using a sampling strategy may make it difficult to acquire diverse enough features. In addition, we also note that using all of these features may damage the meta-learner as there may exist noisy and redundant features. To address these problems, we have proposed a novel stacking ensemble learning algorithm named MSFSS, which divides the learning procedure into two phases. The first stage combined multiple sampling algorithms and multiple supervised learning approaches to construct meta feature space by means of cross combination. The adoption of this strategy satisfied the diversity of the stacking ensemble. The second phase adopted the whale optimization algorithm (WOA) to select the optimal sub-feature combination from the meta feature space, which further improved the quality of the features. Finally, a linear regression classifier was trained as the meta learner to conduct the final prediction. Experimental results on 40 benchmarked imbalanced datasets showed that the proposed MSFSS algorithm significantly outperformed several popular and state-of-the-art class imbalance ensemble learning algorithms. Specifically, the MSFSS acquired the best results in terms of the F-measure metric on 27 datasets and the best results in terms of the G-mean metric on 26 datasets, out of 40 datasets. Although it required consuming more time than several other competitors, the increment of the running time was acceptable. The experimental results indicated the effectiveness and superiority of the proposed MSFSS algorithm.
Author Wang, Shuxiang
Xu, Sen
Yu, Hualong
Shao, Changbin
Yang, Xibei
Author_xml – sequence: 1
  givenname: Shuxiang
  surname: Wang
  fullname: Wang, Shuxiang
  organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
– sequence: 2
  givenname: Changbin
  surname: Shao
  fullname: Shao, Changbin
  organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China, Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi, Jiangsu, China
– sequence: 3
  givenname: Sen
  surname: Xu
  fullname: Xu, Sen
  organization: School of Information Engineering, Yancheng Institute of Technology, Yancheng, Jiangsu, China
– sequence: 4
  givenname: Xibei
  surname: Yang
  fullname: Yang, Xibei
  organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
– sequence: 5
  givenname: Hualong
  surname: Yu
  fullname: Yu, Hualong
  organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
BookMark eNpNkUFP3DAUhC0EEhS48QPyA5rFfnYSu7fVqhQkKg5bztGzY--aOvHKNqq2N_55k4Kqnt5oZt53mU_kdIqTJeSG0RVXXNyOWPYroCBkw07IBYiO162S8vQ_fU6uc36hlAIDAZ24IG_ft3fb7ZdqXf3aY7BVPBQ_-t9YfJxqjdkO1fgaij_MWcbxEPy0q5zF8ppmwwZrlmaVC5qfS2SnbEc9lzHsYvJlP1YupsoEzNm741Lxo8aAk5nRAxa8ImcOQ7bXH_eSPN99_bG5rx-fvj1s1o818laV2molh1YPUgoAzhjtpGbKaqFYozsmOzCuG6TAVksroEXnXMcNV4yi0U7wS_Lwzh0ivvSH5EdMxz6i7_8aMe16TMWbYHvgxsH8SBsAMchBNcygVqCpRGjAzKzP7yyTYs7Jun88RvtljX5Zo_9Yg_8BwM-Bdw
Cites_doi 10.1016/j.knosys.2014.12.007
10.3390/su15032374
10.1145/1007730.1007735
10.1093/bib/bbab089
10.1109/TNNLS.2017.2673241
10.1016/j.eti.2022.102776
10.1016/j.comtox.2021.100178
10.1016/j.apm.2023.05.012
10.3233/AIC-170729
10.1007/BF00116037
10.2991/ijcis.10.1.82
10.1109/ACCESS.2021.3083638
10.1002/abio.370040210
10.1002/wics.1198
10.1016/j.patcog.2012.03.014
10.1016/j.eswa.2020.114246
10.1109/TIM.2021.3136175
10.1613/jair.953
10.1016/j.ins.2009.12.010
10.1145/2907070
10.1007/978-981-16-9447-9_20
10.1109/TPAMI.2020.2981890
10.1016/j.compbiomed.2022.105349
10.1016/j.knosys.2015.10.012
10.1007/s00500-021-06096-3
10.1016/j.advengsoft.2016.01.008
10.12988/ams.2015.58562
10.1109/TFUZZ.2010.2042721
10.1109/TFUZZ.2019.2898371
10.1007/BF00058655
10.1016/j.ins.2022.02.038
10.1007/s11227-023-05073-x
10.1016/j.media.2021.102272
10.1109/ICNSC.2018.8361344
10.1109/TR.2021.3118026
10.1016/j.compbiomed.2021.104527
10.1016/S0893-6080(05)80023-1
10.1186/s12859-020-3411-3
10.1109/TR.2021.3138448
10.1109/IMCOM51814.2021.9377420
10.1023/A:1022643204877
10.1016/j.eswa.2013.05.041
10.1109/TMI.2020.3046692
10.1007/BF00994018
10.1007/s10462-022-10150-3
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.3934/math.2024851
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 2473-6988
EndPage 17530
ExternalDocumentID oai_doaj_org_article_23cf291005224d8d951cab92b08a252c
10_3934_math_2024851
GroupedDBID AAYXX
ADBBV
ALMA_UNASSIGNED_HOLDINGS
AMVHM
BCNDV
CITATION
EBS
FRJ
GROUPED_DOAJ
IAO
ITC
M~E
OK1
RAN
ID FETCH-LOGICAL-a369t-eb98d6bd88422311078b19eb4915b71872cf7d84a6b8e426afff73c3910acbf43
IEDL.DBID DOA
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001230941300004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2473-6988
IngestDate Fri Oct 03 12:43:09 EDT 2025
Sat Nov 29 06:04:44 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a369t-eb98d6bd88422311078b19eb4915b71872cf7d84a6b8e426afff73c3910acbf43
OpenAccessLink https://doaj.org/article/23cf291005224d8d951cab92b08a252c
PageCount 27
ParticipantIDs doaj_primary_oai_doaj_org_article_23cf291005224d8d951cab92b08a252c
crossref_primary_10_3934_math_2024851
PublicationCentury 2000
PublicationDate 2024-01-01
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – month: 01
  year: 2024
  text: 2024-01-01
  day: 01
PublicationDecade 2020
PublicationTitle AIMS mathematics
PublicationYear 2024
Publisher AIMS Press
Publisher_xml – name: AIMS Press
References key-10.3934/math.2024851-42
key-10.3934/math.2024851-43
key-10.3934/math.2024851-40
key-10.3934/math.2024851-41
key-10.3934/math.2024851-46
key-10.3934/math.2024851-47
key-10.3934/math.2024851-44
key-10.3934/math.2024851-45
key-10.3934/math.2024851-48
key-10.3934/math.2024851-49
key-10.3934/math.2024851-31
key-10.3934/math.2024851-32
key-10.3934/math.2024851-30
key-10.3934/math.2024851-7
key-10.3934/math.2024851-35
key-10.3934/math.2024851-6
key-10.3934/math.2024851-36
key-10.3934/math.2024851-5
key-10.3934/math.2024851-33
key-10.3934/math.2024851-4
key-10.3934/math.2024851-34
key-10.3934/math.2024851-3
key-10.3934/math.2024851-39
key-10.3934/math.2024851-2
key-10.3934/math.2024851-1
key-10.3934/math.2024851-37
key-10.3934/math.2024851-38
key-10.3934/math.2024851-19
key-10.3934/math.2024851-9
key-10.3934/math.2024851-8
key-10.3934/math.2024851-20
key-10.3934/math.2024851-21
key-10.3934/math.2024851-24
key-10.3934/math.2024851-25
key-10.3934/math.2024851-22
key-10.3934/math.2024851-23
key-10.3934/math.2024851-28
key-10.3934/math.2024851-29
key-10.3934/math.2024851-26
key-10.3934/math.2024851-27
key-10.3934/math.2024851-50
key-10.3934/math.2024851-53
key-10.3934/math.2024851-10
key-10.3934/math.2024851-54
key-10.3934/math.2024851-51
key-10.3934/math.2024851-52
key-10.3934/math.2024851-13
key-10.3934/math.2024851-14
key-10.3934/math.2024851-11
key-10.3934/math.2024851-12
key-10.3934/math.2024851-17
key-10.3934/math.2024851-18
key-10.3934/math.2024851-15
key-10.3934/math.2024851-16
References_xml – ident: key-10.3934/math.2024851-24
  doi: 10.1016/j.knosys.2014.12.007
– ident: key-10.3934/math.2024851-11
  doi: 10.3390/su15032374
– ident: key-10.3934/math.2024851-17
  doi: 10.1145/1007730.1007735
– ident: key-10.3934/math.2024851-13
  doi: 10.1093/bib/bbab089
– ident: key-10.3934/math.2024851-45
  doi: 10.1109/TNNLS.2017.2673241
– ident: key-10.3934/math.2024851-12
  doi: 10.1016/j.eti.2022.102776
– ident: key-10.3934/math.2024851-14
  doi: 10.1016/j.comtox.2021.100178
– ident: key-10.3934/math.2024851-26
– ident: key-10.3934/math.2024851-37
  doi: 10.1016/j.apm.2023.05.012
– ident: key-10.3934/math.2024851-49
  doi: 10.3233/AIC-170729
– ident: key-10.3934/math.2024851-43
  doi: 10.1007/BF00116037
– ident: key-10.3934/math.2024851-52
  doi: 10.2991/ijcis.10.1.82
– ident: key-10.3934/math.2024851-7
  doi: 10.1109/ACCESS.2021.3083638
– ident: key-10.3934/math.2024851-48
  doi: 10.1002/abio.370040210
– ident: key-10.3934/math.2024851-28
  doi: 10.1002/abio.370040210
– ident: key-10.3934/math.2024851-50
  doi: 10.1002/wics.1198
– ident: key-10.3934/math.2024851-20
  doi: 10.1016/j.patcog.2012.03.014
– ident: key-10.3934/math.2024851-31
  doi: 10.1016/j.eswa.2020.114246
– ident: key-10.3934/math.2024851-9
  doi: 10.1109/TIM.2021.3136175
– ident: key-10.3934/math.2024851-44
– ident: key-10.3934/math.2024851-16
  doi: 10.1613/jair.953
– ident: key-10.3934/math.2024851-54
  doi: 10.1016/j.ins.2009.12.010
– ident: key-10.3934/math.2024851-1
  doi: 10.1145/2907070
– ident: key-10.3934/math.2024851-34
– ident: key-10.3934/math.2024851-38
  doi: 10.1007/978-981-16-9447-9_20
– ident: key-10.3934/math.2024851-2
  doi: 10.1109/TPAMI.2020.2981890
– ident: key-10.3934/math.2024851-51
– ident: key-10.3934/math.2024851-39
  doi: 10.1016/j.compbiomed.2022.105349
– ident: key-10.3934/math.2024851-25
  doi: 10.1016/j.knosys.2015.10.012
– ident: key-10.3934/math.2024851-41
  doi: 10.1007/s00500-021-06096-3
– ident: key-10.3934/math.2024851-36
  doi: 10.1016/j.advengsoft.2016.01.008
– ident: key-10.3934/math.2024851-27
  doi: 10.12988/ams.2015.58562
– ident: key-10.3934/math.2024851-22
  doi: 10.1109/TFUZZ.2010.2042721
– ident: key-10.3934/math.2024851-23
  doi: 10.1109/TFUZZ.2019.2898371
– ident: key-10.3934/math.2024851-42
  doi: 10.1007/BF00058655
– ident: key-10.3934/math.2024851-21
  doi: 10.1016/j.ins.2022.02.038
– ident: key-10.3934/math.2024851-33
– ident: key-10.3934/math.2024851-18
– ident: key-10.3934/math.2024851-5
  doi: 10.1007/s11227-023-05073-x
– ident: key-10.3934/math.2024851-3
  doi: 10.1016/j.media.2021.102272
– ident: key-10.3934/math.2024851-29
  doi: 10.1109/ICNSC.2018.8361344
– ident: key-10.3934/math.2024851-8
  doi: 10.1109/TR.2021.3118026
– ident: key-10.3934/math.2024851-4
  doi: 10.1016/j.compbiomed.2021.104527
– ident: key-10.3934/math.2024851-32
  doi: 10.1016/S0893-6080(05)80023-1
– ident: key-10.3934/math.2024851-15
  doi: 10.1186/s12859-020-3411-3
– ident: key-10.3934/math.2024851-10
  doi: 10.1109/TR.2021.3138448
– ident: key-10.3934/math.2024851-30
  doi: 10.1109/IMCOM51814.2021.9377420
– ident: key-10.3934/math.2024851-46
  doi: 10.1023/A:1022643204877
– ident: key-10.3934/math.2024851-35
  doi: 10.1016/j.eswa.2013.05.041
– ident: key-10.3934/math.2024851-6
  doi: 10.1109/TMI.2020.3046692
– ident: key-10.3934/math.2024851-47
  doi: 10.1007/BF00994018
– ident: key-10.3934/math.2024851-40
  doi: 10.1007/s10462-022-10150-3
– ident: key-10.3934/math.2024851-19
– ident: key-10.3934/math.2024851-53
SSID ssj0002124274
Score 2.2633252
Snippet Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 17504
SubjectTerms feature selection
imbalanced data classification
meta learning
sampling
stacking ensemble
whale optimization algorithm
Title MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced data
URI https://doaj.org/article/23cf291005224d8d951cab92b08a252c
Volume 9
WOSCitedRecordID wos001230941300004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2473-6988
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002124274
  issn: 2473-6988
  databaseCode: DOA
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2473-6988
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002124274
  issn: 2473-6988
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELUQYoAB8Sm-5QHGqEnsJDYbICqWVkgFqVvk80epRFvUFtiQ-OfcJWlVJhaWDJFlWc_JvXfJ-R1jlz5NhLXBRVnIXSRlZiIImYogccp7yJwLdbOJottV_b5-XGn1RTVhtT1wDVwrFTakyGkxCgXplENFYA3oFGJl0iy1FH3jQq8kUxSDMSBLzLfqSnehhWyh_qN_D-TglfzioBWr_opT2jtsuxGD_KZexC5b8-M9ttVZOqnO9tl3p9fu9a75Df98wVjOJ_iKj5qzkxFRkOOLmkA-M1QePh7w4Cu7Tj6rmtzgSI4a0NJHcY5Zqx8BDjavg8l0OH8ZcZSt3JKIHlZnnvhwBFTvaHFqqh89YM_t-6e7h6hpmxAZket55EErl4NTSiL3U36nINEepE4yQCoqUhsKp6TJQXkkaBNCKIQVCLKxEKQ4ZOvjydgfMa5TgwEAMouxSMZGAwgbxyaTYGOj8uKYXS2ALN9qd4wSswoCvCTAywbwY3ZLKC_HkKd1dQN3umx2uvxrp0_-Y5JTtklrqj-inLH1-fTdn7MN-zEfzqYX1UOE187X_Q_XzNBq
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MSFSS%3A+A+whale+optimization-based+multiple+sampling+feature+selection+stacking+ensemble+algorithm+for+classifying+imbalanced+data&rft.jtitle=AIMS+mathematics&rft.au=Shuxiang+Wang&rft.au=Changbin+Shao&rft.au=Sen+Xu&rft.au=Xibei+Yang&rft.date=2024-01-01&rft.pub=AIMS+Press&rft.eissn=2473-6988&rft.volume=9&rft.issue=7&rft.spage=17504&rft.epage=17530&rft_id=info:doi/10.3934%2Fmath.2024851&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_23cf291005224d8d951cab92b08a252c
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2473-6988&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2473-6988&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2473-6988&client=summon