MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced data
Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble b...
Uloženo v:
| Vydáno v: | AIMS mathematics Ročník 9; číslo 7; s. 17504 - 17530 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
AIMS Press
01.01.2024
|
| Témata: | |
| ISSN: | 2473-6988, 2473-6988 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble by using a meta-learner to combine the predictions of multiple base classifiers, has been used for solving class imbalance learning issues. Specifically, in the context of class imbalance learning, a stacking ensemble learning algorithm is generally considered to combine with a specific sampling algorithm. Such an operation, however, might suffer from suboptimization problems as only using a sampling strategy may make it difficult to acquire diverse enough features. In addition, we also note that using all of these features may damage the meta-learner as there may exist noisy and redundant features. To address these problems, we have proposed a novel stacking ensemble learning algorithm named MSFSS, which divides the learning procedure into two phases. The first stage combined multiple sampling algorithms and multiple supervised learning approaches to construct meta feature space by means of cross combination. The adoption of this strategy satisfied the diversity of the stacking ensemble. The second phase adopted the whale optimization algorithm (WOA) to select the optimal sub-feature combination from the meta feature space, which further improved the quality of the features. Finally, a linear regression classifier was trained as the meta learner to conduct the final prediction. Experimental results on 40 benchmarked imbalanced datasets showed that the proposed MSFSS algorithm significantly outperformed several popular and state-of-the-art class imbalance ensemble learning algorithms. Specifically, the MSFSS acquired the best results in terms of the F-measure metric on 27 datasets and the best results in terms of the G-mean metric on 26 datasets, out of 40 datasets. Although it required consuming more time than several other competitors, the increment of the running time was acceptable. The experimental results indicated the effectiveness and superiority of the proposed MSFSS algorithm. |
|---|---|
| AbstractList | Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble by using a meta-learner to combine the predictions of multiple base classifiers, has been used for solving class imbalance learning issues. Specifically, in the context of class imbalance learning, a stacking ensemble learning algorithm is generally considered to combine with a specific sampling algorithm. Such an operation, however, might suffer from suboptimization problems as only using a sampling strategy may make it difficult to acquire diverse enough features. In addition, we also note that using all of these features may damage the meta-learner as there may exist noisy and redundant features. To address these problems, we have proposed a novel stacking ensemble learning algorithm named MSFSS, which divides the learning procedure into two phases. The first stage combined multiple sampling algorithms and multiple supervised learning approaches to construct meta feature space by means of cross combination. The adoption of this strategy satisfied the diversity of the stacking ensemble. The second phase adopted the whale optimization algorithm (WOA) to select the optimal sub-feature combination from the meta feature space, which further improved the quality of the features. Finally, a linear regression classifier was trained as the meta learner to conduct the final prediction. Experimental results on 40 benchmarked imbalanced datasets showed that the proposed MSFSS algorithm significantly outperformed several popular and state-of-the-art class imbalance ensemble learning algorithms. Specifically, the MSFSS acquired the best results in terms of the F-measure metric on 27 datasets and the best results in terms of the G-mean metric on 26 datasets, out of 40 datasets. Although it required consuming more time than several other competitors, the increment of the running time was acceptable. The experimental results indicated the effectiveness and superiority of the proposed MSFSS algorithm. |
| Author | Wang, Shuxiang Xu, Sen Yu, Hualong Shao, Changbin Yang, Xibei |
| Author_xml | – sequence: 1 givenname: Shuxiang surname: Wang fullname: Wang, Shuxiang organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China – sequence: 2 givenname: Changbin surname: Shao fullname: Shao, Changbin organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China, Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi, Jiangsu, China – sequence: 3 givenname: Sen surname: Xu fullname: Xu, Sen organization: School of Information Engineering, Yancheng Institute of Technology, Yancheng, Jiangsu, China – sequence: 4 givenname: Xibei surname: Yang fullname: Yang, Xibei organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China – sequence: 5 givenname: Hualong surname: Yu fullname: Yu, Hualong organization: School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China |
| BookMark | eNpNkUFP3DAUhC0EEhS48QPyA5rFfnYSu7fVqhQkKg5bztGzY--aOvHKNqq2N_55k4Kqnt5oZt53mU_kdIqTJeSG0RVXXNyOWPYroCBkw07IBYiO162S8vQ_fU6uc36hlAIDAZ24IG_ft3fb7ZdqXf3aY7BVPBQ_-t9YfJxqjdkO1fgaij_MWcbxEPy0q5zF8ppmwwZrlmaVC5qfS2SnbEc9lzHsYvJlP1YupsoEzNm741Lxo8aAk5nRAxa8ImcOQ7bXH_eSPN99_bG5rx-fvj1s1o818laV2molh1YPUgoAzhjtpGbKaqFYozsmOzCuG6TAVksroEXnXMcNV4yi0U7wS_Lwzh0ivvSH5EdMxz6i7_8aMe16TMWbYHvgxsH8SBsAMchBNcygVqCpRGjAzKzP7yyTYs7Jun88RvtljX5Zo_9Yg_8BwM-Bdw |
| Cites_doi | 10.1016/j.knosys.2014.12.007 10.3390/su15032374 10.1145/1007730.1007735 10.1093/bib/bbab089 10.1109/TNNLS.2017.2673241 10.1016/j.eti.2022.102776 10.1016/j.comtox.2021.100178 10.1016/j.apm.2023.05.012 10.3233/AIC-170729 10.1007/BF00116037 10.2991/ijcis.10.1.82 10.1109/ACCESS.2021.3083638 10.1002/abio.370040210 10.1002/wics.1198 10.1016/j.patcog.2012.03.014 10.1016/j.eswa.2020.114246 10.1109/TIM.2021.3136175 10.1613/jair.953 10.1016/j.ins.2009.12.010 10.1145/2907070 10.1007/978-981-16-9447-9_20 10.1109/TPAMI.2020.2981890 10.1016/j.compbiomed.2022.105349 10.1016/j.knosys.2015.10.012 10.1007/s00500-021-06096-3 10.1016/j.advengsoft.2016.01.008 10.12988/ams.2015.58562 10.1109/TFUZZ.2010.2042721 10.1109/TFUZZ.2019.2898371 10.1007/BF00058655 10.1016/j.ins.2022.02.038 10.1007/s11227-023-05073-x 10.1016/j.media.2021.102272 10.1109/ICNSC.2018.8361344 10.1109/TR.2021.3118026 10.1016/j.compbiomed.2021.104527 10.1016/S0893-6080(05)80023-1 10.1186/s12859-020-3411-3 10.1109/TR.2021.3138448 10.1109/IMCOM51814.2021.9377420 10.1023/A:1022643204877 10.1016/j.eswa.2013.05.041 10.1109/TMI.2020.3046692 10.1007/BF00994018 10.1007/s10462-022-10150-3 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.3934/math.2024851 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics |
| EISSN | 2473-6988 |
| EndPage | 17530 |
| ExternalDocumentID | oai_doaj_org_article_23cf291005224d8d951cab92b08a252c 10_3934_math_2024851 |
| GroupedDBID | AAYXX ADBBV ALMA_UNASSIGNED_HOLDINGS AMVHM BCNDV CITATION EBS FRJ GROUPED_DOAJ IAO ITC M~E OK1 RAN |
| ID | FETCH-LOGICAL-a369t-eb98d6bd88422311078b19eb4915b71872cf7d84a6b8e426afff73c3910acbf43 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001230941300004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2473-6988 |
| IngestDate | Fri Oct 03 12:43:09 EDT 2025 Sat Nov 29 06:04:44 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a369t-eb98d6bd88422311078b19eb4915b71872cf7d84a6b8e426afff73c3910acbf43 |
| OpenAccessLink | https://doaj.org/article/23cf291005224d8d951cab92b08a252c |
| PageCount | 27 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_23cf291005224d8d951cab92b08a252c crossref_primary_10_3934_math_2024851 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-01-01 |
| PublicationDateYYYYMMDD | 2024-01-01 |
| PublicationDate_xml | – month: 01 year: 2024 text: 2024-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | AIMS mathematics |
| PublicationYear | 2024 |
| Publisher | AIMS Press |
| Publisher_xml | – name: AIMS Press |
| References | key-10.3934/math.2024851-42 key-10.3934/math.2024851-43 key-10.3934/math.2024851-40 key-10.3934/math.2024851-41 key-10.3934/math.2024851-46 key-10.3934/math.2024851-47 key-10.3934/math.2024851-44 key-10.3934/math.2024851-45 key-10.3934/math.2024851-48 key-10.3934/math.2024851-49 key-10.3934/math.2024851-31 key-10.3934/math.2024851-32 key-10.3934/math.2024851-30 key-10.3934/math.2024851-7 key-10.3934/math.2024851-35 key-10.3934/math.2024851-6 key-10.3934/math.2024851-36 key-10.3934/math.2024851-5 key-10.3934/math.2024851-33 key-10.3934/math.2024851-4 key-10.3934/math.2024851-34 key-10.3934/math.2024851-3 key-10.3934/math.2024851-39 key-10.3934/math.2024851-2 key-10.3934/math.2024851-1 key-10.3934/math.2024851-37 key-10.3934/math.2024851-38 key-10.3934/math.2024851-19 key-10.3934/math.2024851-9 key-10.3934/math.2024851-8 key-10.3934/math.2024851-20 key-10.3934/math.2024851-21 key-10.3934/math.2024851-24 key-10.3934/math.2024851-25 key-10.3934/math.2024851-22 key-10.3934/math.2024851-23 key-10.3934/math.2024851-28 key-10.3934/math.2024851-29 key-10.3934/math.2024851-26 key-10.3934/math.2024851-27 key-10.3934/math.2024851-50 key-10.3934/math.2024851-53 key-10.3934/math.2024851-10 key-10.3934/math.2024851-54 key-10.3934/math.2024851-51 key-10.3934/math.2024851-52 key-10.3934/math.2024851-13 key-10.3934/math.2024851-14 key-10.3934/math.2024851-11 key-10.3934/math.2024851-12 key-10.3934/math.2024851-17 key-10.3934/math.2024851-18 key-10.3934/math.2024851-15 key-10.3934/math.2024851-16 |
| References_xml | – ident: key-10.3934/math.2024851-24 doi: 10.1016/j.knosys.2014.12.007 – ident: key-10.3934/math.2024851-11 doi: 10.3390/su15032374 – ident: key-10.3934/math.2024851-17 doi: 10.1145/1007730.1007735 – ident: key-10.3934/math.2024851-13 doi: 10.1093/bib/bbab089 – ident: key-10.3934/math.2024851-45 doi: 10.1109/TNNLS.2017.2673241 – ident: key-10.3934/math.2024851-12 doi: 10.1016/j.eti.2022.102776 – ident: key-10.3934/math.2024851-14 doi: 10.1016/j.comtox.2021.100178 – ident: key-10.3934/math.2024851-26 – ident: key-10.3934/math.2024851-37 doi: 10.1016/j.apm.2023.05.012 – ident: key-10.3934/math.2024851-49 doi: 10.3233/AIC-170729 – ident: key-10.3934/math.2024851-43 doi: 10.1007/BF00116037 – ident: key-10.3934/math.2024851-52 doi: 10.2991/ijcis.10.1.82 – ident: key-10.3934/math.2024851-7 doi: 10.1109/ACCESS.2021.3083638 – ident: key-10.3934/math.2024851-48 doi: 10.1002/abio.370040210 – ident: key-10.3934/math.2024851-28 doi: 10.1002/abio.370040210 – ident: key-10.3934/math.2024851-50 doi: 10.1002/wics.1198 – ident: key-10.3934/math.2024851-20 doi: 10.1016/j.patcog.2012.03.014 – ident: key-10.3934/math.2024851-31 doi: 10.1016/j.eswa.2020.114246 – ident: key-10.3934/math.2024851-9 doi: 10.1109/TIM.2021.3136175 – ident: key-10.3934/math.2024851-44 – ident: key-10.3934/math.2024851-16 doi: 10.1613/jair.953 – ident: key-10.3934/math.2024851-54 doi: 10.1016/j.ins.2009.12.010 – ident: key-10.3934/math.2024851-1 doi: 10.1145/2907070 – ident: key-10.3934/math.2024851-34 – ident: key-10.3934/math.2024851-38 doi: 10.1007/978-981-16-9447-9_20 – ident: key-10.3934/math.2024851-2 doi: 10.1109/TPAMI.2020.2981890 – ident: key-10.3934/math.2024851-51 – ident: key-10.3934/math.2024851-39 doi: 10.1016/j.compbiomed.2022.105349 – ident: key-10.3934/math.2024851-25 doi: 10.1016/j.knosys.2015.10.012 – ident: key-10.3934/math.2024851-41 doi: 10.1007/s00500-021-06096-3 – ident: key-10.3934/math.2024851-36 doi: 10.1016/j.advengsoft.2016.01.008 – ident: key-10.3934/math.2024851-27 doi: 10.12988/ams.2015.58562 – ident: key-10.3934/math.2024851-22 doi: 10.1109/TFUZZ.2010.2042721 – ident: key-10.3934/math.2024851-23 doi: 10.1109/TFUZZ.2019.2898371 – ident: key-10.3934/math.2024851-42 doi: 10.1007/BF00058655 – ident: key-10.3934/math.2024851-21 doi: 10.1016/j.ins.2022.02.038 – ident: key-10.3934/math.2024851-33 – ident: key-10.3934/math.2024851-18 – ident: key-10.3934/math.2024851-5 doi: 10.1007/s11227-023-05073-x – ident: key-10.3934/math.2024851-3 doi: 10.1016/j.media.2021.102272 – ident: key-10.3934/math.2024851-29 doi: 10.1109/ICNSC.2018.8361344 – ident: key-10.3934/math.2024851-8 doi: 10.1109/TR.2021.3118026 – ident: key-10.3934/math.2024851-4 doi: 10.1016/j.compbiomed.2021.104527 – ident: key-10.3934/math.2024851-32 doi: 10.1016/S0893-6080(05)80023-1 – ident: key-10.3934/math.2024851-15 doi: 10.1186/s12859-020-3411-3 – ident: key-10.3934/math.2024851-10 doi: 10.1109/TR.2021.3138448 – ident: key-10.3934/math.2024851-30 doi: 10.1109/IMCOM51814.2021.9377420 – ident: key-10.3934/math.2024851-46 doi: 10.1023/A:1022643204877 – ident: key-10.3934/math.2024851-35 doi: 10.1016/j.eswa.2013.05.041 – ident: key-10.3934/math.2024851-6 doi: 10.1109/TMI.2020.3046692 – ident: key-10.3934/math.2024851-47 doi: 10.1007/BF00994018 – ident: key-10.3934/math.2024851-40 doi: 10.1007/s10462-022-10150-3 – ident: key-10.3934/math.2024851-19 – ident: key-10.3934/math.2024851-53 |
| SSID | ssj0002124274 |
| Score | 2.2633252 |
| Snippet | Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| StartPage | 17504 |
| SubjectTerms | feature selection imbalanced data classification meta learning sampling stacking ensemble whale optimization algorithm |
| Title | MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced data |
| URI | https://doaj.org/article/23cf291005224d8d951cab92b08a252c |
| Volume | 9 |
| WOSCitedRecordID | wos001230941300004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2473-6988 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002124274 issn: 2473-6988 databaseCode: DOA dateStart: 20160101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2473-6988 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002124274 issn: 2473-6988 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELUQYoAB8Sm-5QHGqEnsJDYbICqWVkgFqVvk80epRFvUFtiQ-OfcJWlVJhaWDJFlWc_JvXfJ-R1jlz5NhLXBRVnIXSRlZiIImYogccp7yJwLdbOJottV_b5-XGn1RTVhtT1wDVwrFTakyGkxCgXplENFYA3oFGJl0iy1FH3jQq8kUxSDMSBLzLfqSnehhWyh_qN_D-TglfzioBWr_opT2jtsuxGD_KZexC5b8-M9ttVZOqnO9tl3p9fu9a75Df98wVjOJ_iKj5qzkxFRkOOLmkA-M1QePh7w4Cu7Tj6rmtzgSI4a0NJHcY5Zqx8BDjavg8l0OH8ZcZSt3JKIHlZnnvhwBFTvaHFqqh89YM_t-6e7h6hpmxAZket55EErl4NTSiL3U36nINEepE4yQCoqUhsKp6TJQXkkaBNCKIQVCLKxEKQ4ZOvjydgfMa5TgwEAMouxSMZGAwgbxyaTYGOj8uKYXS2ALN9qd4wSswoCvCTAywbwY3ZLKC_HkKd1dQN3umx2uvxrp0_-Y5JTtklrqj-inLH1-fTdn7MN-zEfzqYX1UOE187X_Q_XzNBq |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MSFSS%3A+A+whale+optimization-based+multiple+sampling+feature+selection+stacking+ensemble+algorithm+for+classifying+imbalanced+data&rft.jtitle=AIMS+mathematics&rft.au=Shuxiang+Wang&rft.au=Changbin+Shao&rft.au=Sen+Xu&rft.au=Xibei+Yang&rft.date=2024-01-01&rft.pub=AIMS+Press&rft.eissn=2473-6988&rft.volume=9&rft.issue=7&rft.spage=17504&rft.epage=17530&rft_id=info:doi/10.3934%2Fmath.2024851&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_23cf291005224d8d951cab92b08a252c |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2473-6988&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2473-6988&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2473-6988&client=summon |