Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights

The identification of relevant biomarkers from high-dimensional cancer data remains a significant challenge due to the complexity and heterogeneity inherent in various cancer types. Conventional feature selection methods often struggle to effectively navigate the vast solution space while maintainin...

Full description

Saved in:
Bibliographic Details
Published in:Cancer Immunology, Immunotherapy : CII Vol. 73; no. 12; p. 261
Main Authors: Yaqoob, Abrar, Verma, Navneet Kumar, Aziz, Rabia Musheer, Shah, Mohd Asif
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 09.10.2024
Springer Nature B.V
Subjects:
ISSN:1432-0851, 0340-7004, 1432-0851
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The identification of relevant biomarkers from high-dimensional cancer data remains a significant challenge due to the complexity and heterogeneity inherent in various cancer types. Conventional feature selection methods often struggle to effectively navigate the vast solution space while maintaining high predictive accuracy. In response to these challenges, we introduce a novel feature selection approach that integrates Random Drift Optimization (RDO) with XGBoost, specifically designed to enhance the performance of cancer classification tasks. Our proposed framework not only improves classification accuracy but also offers valuable insights into the underlying biological mechanisms driving cancer progression. Through comprehensive experiments conducted on real-world cancer datasets, including Central Nervous System (CNS), Leukemia, Breast, and Ovarian cancers, we demonstrate the efficacy of our method in identifying a smaller subset of unique and relevant genes. This selection results in significantly improved classification efficiency and accuracy. When compared with popular classifiers such as Support Vector Machine, K-Nearest Neighbor, and Naive Bayes, our approach consistently outperforms these models in terms of both accuracy and F-measure metrics. For instance, our framework achieved an accuracy of 97.24% in the CNS dataset, 99.14% in Leukemia, 95.21% in Ovarian, and 87.62% in Breast cancer, showcasing its robustness and effectiveness across different types of cancer data. These results underline the potential of our RDO-XGBoost framework as a promising solution for feature selection in cancer data analysis, offering enhanced predictive performance and valuable biological insights.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1432-0851
0340-7004
1432-0851
DOI:10.1007/s00262-024-03843-x