Gene selection and cancer classification using interaction-based feature clustering and improved-binary Bat algorithm

In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Computers in biology and medicine Ročník 181; s. 109071
Hlavní autori: Esfandiari, Ahmad, Nasiri, Niki
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States Elsevier Ltd 01.10.2024
Elsevier Limited
Predmet:
ISSN:0010-4825, 1879-0534, 1879-0534
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify the most informative genes with high accuracy. First, a clustering-based multivariate filter approach is performed to explore the interactions between the features and eliminate any redundant or irrelevant ones. Then, by controlling for the problem of premature convergence in the binary Bat algorithm, the optimal gene subset is determined using different classifiers with the Monte Carlo cross-validation data partitioning model. The effectiveness of our proposed framework is evaluated using eight gene expression datasets, by comparison with other recently published algorithms in the literature. Experiments confirm that in seven out of eight datasets, the proposed method can achieve superior results in terms of classification accuracy and gene subset size. In particular, it achieves a classification accuracy of 100% in Lymphoma and Ovarian datasets and above 97.4% in the rest with a minimum number of genes. The results demonstrate that our proposed algorithm has the potential to solve the feature selection problem in different applications with high-dimensional datasets. •A new multivariate filter method was developed to identify informative genes.•The BA exploration ability was improved using new inertia weight adjustment methods.•Comprehensive experiments demonstrate the effectiveness of the proposed IFC-iBBA.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0010-4825
1879-0534
1879-0534
DOI:10.1016/j.compbiomed.2024.109071