Optimizing imbalanced learning with genetic algorithm.
Gespeichert in:
| Titel: | Optimizing imbalanced learning with genetic algorithm. |
|---|---|
| Autoren: | Safder MU; Department of Computer Science, Namal University, Mianwali, Punjab, 42250, Pakistan., Naveed SS; Department of Computer Science, Namal University, Mianwali, Punjab, 42250, Pakistan., Khurshid K; Department of Computer Science, Namal University, Mianwali, Punjab, 42250, Pakistan. khawar.khurshid@namal.edu.pk., Salman A; College of Business Administration, American University in the Emirates, DIAC, PO Box 503000, Dubai, United Arab Emirates.; School of Electrical Engineering and Computer Science, National University of Sciences and Technology (NUST), Islamabad, 44000, Pakistan., Nizami IF; Department of Electrical Engineering, Bahria University, Islamabad, ICT, 44000, Pakistan. |
| Quelle: | Scientific reports [Sci Rep] 2025 Oct 07; Vol. 15 (1), pp. 34857. Date of Electronic Publication: 2025 Oct 07. |
| Publikationsart: | Journal Article |
| Sprache: | English |
| Info zur Zeitschrift: | Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE |
| Imprint Name(s): | Original Publication: London : Nature Publishing Group, copyright 2011- |
| MeSH-Schlagworte: | Algorithms* , Machine Learning*, Humans ; Support Vector Machine ; Genetic Algorithms |
| Abstract: | Competing Interests: Declarations. Competing interests: The authors declare no competing financial or non-financial interests that could have influenced the research presented in this manuscript. Declaration of the AI-assisted technologies: During the preparation of this manuscript, the authors have utilized the basic feature of the OpenAI’s GPT model, solely for the purpose of eliminating grammatical errors and improving the overall readability of the document. The authors have thoroughly reviewed the final manuscript and take full responsibility for the content of the published article. Training AI models on imbalanced datasets with skewed class distributions poses a significant challenge, as it leads to model bias towards the majority class while neglecting the minority class. Various methods, such as Synthetic Minority Over Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), have been employed to generate synthetic data to address this issue. However, these methods are often unable to enhance model performance, especially in case of extreme class imbalance. To overcome this challenge, a novel approach to generate synthetic data is proposed which uses Genetic Algorithms (GAs) and does not require large sample size. It aims to outperform state-of-the-art methods, like SMOTE, ADASYN, GAN and VAE in terms of model performance. Although GAs are traditionally used for optimization tasks, they can also produce synthetic datasets optimized through fitness function and population initialization. Our synthetic data generation approach analyzes the Simple as well as the Elitist Genetic Algorithms, along with Logistic Regression and Support Vector Machines to evaluate the population initialization and fitness function. Experimental results across three datasets (Credit Card Fraud Detection, PIMA Indian Diabetes, and PHONEME) demonstrate that the proposed method significantly outperforms the previous techniques based on the commonly used performance metrics, including accuracy, precision, recall, F1-score, ROC-AUC, and AP (Accuracy-Precision) curve. This highlights the potential of GAs in the development of accurate and reliable AI models for imbalanced datasets. (© 2025. The Author(s).) |
| References: | Sci Rep. 2024 Oct 10;14(1):23784. (PMID: 39390014) Sensors (Basel). 2022 Apr 23;22(9):. (PMID: 35590937) Sci Rep. 2024 Oct 18;14(1):24489. (PMID: 39424849) Multimed Tools Appl. 2021;80(5):8091-8126. (PMID: 33162782) Sci Rep. 2024 Oct 8;14(1):23368. (PMID: 39375370) |
| Entry Date(s): | Date Created: 20251007 Date Completed: 20251007 Latest Revision: 20251113 |
| Update Code: | 20251113 |
| PubMed Central ID: | PMC12504573 |
| DOI: | 10.1038/s41598-025-09424-x |
| PMID: | 41057396 |
| Datenbank: | MEDLINE |
| Abstract: | Competing Interests: Declarations. Competing interests: The authors declare no competing financial or non-financial interests that could have influenced the research presented in this manuscript. Declaration of the AI-assisted technologies: During the preparation of this manuscript, the authors have utilized the basic feature of the OpenAI’s GPT model, solely for the purpose of eliminating grammatical errors and improving the overall readability of the document. The authors have thoroughly reviewed the final manuscript and take full responsibility for the content of the published article.<br />Training AI models on imbalanced datasets with skewed class distributions poses a significant challenge, as it leads to model bias towards the majority class while neglecting the minority class. Various methods, such as Synthetic Minority Over Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), have been employed to generate synthetic data to address this issue. However, these methods are often unable to enhance model performance, especially in case of extreme class imbalance. To overcome this challenge, a novel approach to generate synthetic data is proposed which uses Genetic Algorithms (GAs) and does not require large sample size. It aims to outperform state-of-the-art methods, like SMOTE, ADASYN, GAN and VAE in terms of model performance. Although GAs are traditionally used for optimization tasks, they can also produce synthetic datasets optimized through fitness function and population initialization. Our synthetic data generation approach analyzes the Simple as well as the Elitist Genetic Algorithms, along with Logistic Regression and Support Vector Machines to evaluate the population initialization and fitness function. Experimental results across three datasets (Credit Card Fraud Detection, PIMA Indian Diabetes, and PHONEME) demonstrate that the proposed method significantly outperforms the previous techniques based on the commonly used performance metrics, including accuracy, precision, recall, F1-score, ROC-AUC, and AP (Accuracy-Precision) curve. This highlights the potential of GAs in the development of accurate and reliable AI models for imbalanced datasets.<br /> (© 2025. The Author(s).) |
|---|---|
| ISSN: | 2045-2322 |
| DOI: | 10.1038/s41598-025-09424-x |
Full Text Finder
Nájsť tento článok vo Web of Science