A hybrid feature selection method based on information theory and binary butterfly optimization algorithm

Feature selection is the problem of finding the optimal subset of features for predicting class labels by removing irrelevant or redundant features. S-shaped Binary Butterfly Optimization Algorithm (S-bBOA) is a nature-inspired algorithm for solving the feature selection problems. The evidence shows...

Full description

Saved in:
Bibliographic Details
Published in:Engineering applications of artificial intelligence Vol. 97; p. 104079
Main Authors: Sadeghian, Zohre, Akbari, Ebrahim, Nematzadeh, Hossein
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.01.2021
Subjects:
ISSN:0952-1976, 1873-6769
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Feature selection is the problem of finding the optimal subset of features for predicting class labels by removing irrelevant or redundant features. S-shaped Binary Butterfly Optimization Algorithm (S-bBOA) is a nature-inspired algorithm for solving the feature selection problems. The evidence shows that S-bBOA has a better performance in exploration, exploitation, convergence, and avoidance of getting stuck in local optimal compared to other optimization algorithms. However, S-bBOA does not consider redundancy and relevancy of features. This paper proposes Information Gain binary Butterfly Optimization Algorithm (IG-bBOA), to overcome the S-bBOA constraints firstly. IG-bBOA maximizes both the classification accuracy and the mean of the mutual information between features and class labels. In addition, IG-bBOA also tries to minimize the number of selected features and is used within a three-phase proposed method called Ensemble Information Theory based binary Butterfly Optimization Algorithm (EIT-bBOA). In the first phase, 80% of irrelevant and redundant features are removed using Minimal Redundancy-Maximal New Classification Information (MR-MNCI) feature selection. In the second phase, the best feature subset is selected using IG-bBOA. Finally, a similarity based ranking method is used to select the final features subset. The experimental results are conducted using six standard datasets from UCI repository. The findings confirm the efficiency of the proposed method in improving the classification accuracy and selecting the best optimal features subset with minimum number of feature in most cases. •Minimizing two type of feature redundancy and maximizing relevancy between features and class label, before solution optimization.•Using a three-objective function to determine the fitness of each solution in the binary butterfly optimization algorithm.•Using an ensemble similarity-based ranking method in final phase for selection of the best subset.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2020.104079