A hybrid feature selection method based on information theory and binary butterfly optimization algorithm
Feature selection is the problem of finding the optimal subset of features for predicting class labels by removing irrelevant or redundant features. S-shaped Binary Butterfly Optimization Algorithm (S-bBOA) is a nature-inspired algorithm for solving the feature selection problems. The evidence shows...
Saved in:
| Published in: | Engineering applications of artificial intelligence Vol. 97; p. 104079 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.01.2021
|
| Subjects: | |
| ISSN: | 0952-1976, 1873-6769 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Feature selection is the problem of finding the optimal subset of features for predicting class labels by removing irrelevant or redundant features. S-shaped Binary Butterfly Optimization Algorithm (S-bBOA) is a nature-inspired algorithm for solving the feature selection problems. The evidence shows that S-bBOA has a better performance in exploration, exploitation, convergence, and avoidance of getting stuck in local optimal compared to other optimization algorithms. However, S-bBOA does not consider redundancy and relevancy of features. This paper proposes Information Gain binary Butterfly Optimization Algorithm (IG-bBOA), to overcome the S-bBOA constraints firstly. IG-bBOA maximizes both the classification accuracy and the mean of the mutual information between features and class labels. In addition, IG-bBOA also tries to minimize the number of selected features and is used within a three-phase proposed method called Ensemble Information Theory based binary Butterfly Optimization Algorithm (EIT-bBOA). In the first phase, 80% of irrelevant and redundant features are removed using Minimal Redundancy-Maximal New Classification Information (MR-MNCI) feature selection. In the second phase, the best feature subset is selected using IG-bBOA. Finally, a similarity based ranking method is used to select the final features subset. The experimental results are conducted using six standard datasets from UCI repository. The findings confirm the efficiency of the proposed method in improving the classification accuracy and selecting the best optimal features subset with minimum number of feature in most cases.
•Minimizing two type of feature redundancy and maximizing relevancy between features and class label, before solution optimization.•Using a three-objective function to determine the fitness of each solution in the binary butterfly optimization algorithm.•Using an ensemble similarity-based ranking method in final phase for selection of the best subset. |
|---|---|
| ISSN: | 0952-1976 1873-6769 |
| DOI: | 10.1016/j.engappai.2020.104079 |