Feature selection based on dataset variance optimization using Hybrid Sine Cosine – Firehawk Algorithm (HSCFHA)

Feature selection plays a pivotal role in preprocessing data for machine learning (ML) models. It entails choosing a subset of pertinent features to enhance the model’s accuracy and minimize overfitting. Wrapper methods based on metaheuristics are one approach to feature selection, leveraging the pr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Future generation computer systems Ročník 155; s. 272 - 286
Hlavní autoři: Moosavi, Syed Kumayl Raza, Saadat, Ahsan, Abaid, Zainab, Ni, Wei, Li, Kai, Guizani, Mohsen
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.06.2024
Témata:
ISSN:0167-739X, 1872-7115
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Feature selection plays a pivotal role in preprocessing data for machine learning (ML) models. It entails choosing a subset of pertinent features to enhance the model’s accuracy and minimize overfitting. Wrapper methods based on metaheuristics are one approach to feature selection, leveraging the predictive accuracy of a learning algorithm to form a condensed set of features. Traditionally, this method uses K-Nearest Neighbor (KNN) for maximizing accuracy as its cost function. However, this approach often yields less than optimal results in large sample spaces and demands considerable computational resources. To circumvent the shortcomings of this approach, this work proposes a novel metaheuristic algorithm, termed the Hybrid Sine Cosine Firehawk Algorithm. Furthermore, a novel feature selection technique is designed that uses this hybrid algorithm to eliminate insignificant and redundant features by incorporating the minimization of dataset variance in the cost function. Additionally, the hybridization of multiple metaheuristic algorithms produces the best features of each algorithm to improve the exploration ability. The proposed technique is tested on 22 University of California Irvine datasets containing low, medium and high dimensional datasets and compared to the traditional KNN-based approach. The technique is also compared with other state-of-the-art metaheuristic techniques, namely Particle Swarm Optimizer, Grey Wolf Optimizer, Whale Optimization Algorithm, Hybrid Ant Colony Optimizer and Improved Binary Bat Algorithm. The results show significant improvements over previous techniques in terms of minimal loss in essential data while reducing the size of the raw data in considerably less time, as well as a well-balanced confusion matrix. •Feature Selection using Variance minimization.•Hybrid Sine Cosine - Firehawk Algorithm•Comparative Analysis with metaheuristic techniques.•Tested on multi-dimensional and bi/multi class datasets.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2024.02.017