A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning

The data acquisition methods are becoming increasingly diverse and advanced, leading to higher data dimensions, blurred classification boundaries, and overfitting datasets, affecting machine learning models’ accuracy. Many studies have sought to improve model performance through feature selection. H...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Scientific reports Ročník 15; číslo 1; s. 16828 - 16
Hlavní autori: Ding, Junyao, Du, Jianchao, Wang, Hejie, Xiao, Song
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: London Nature Publishing Group UK 14.05.2025
Nature Publishing Group
Nature Portfolio
Predmet:
ISSN:2045-2322, 2045-2322
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The data acquisition methods are becoming increasingly diverse and advanced, leading to higher data dimensions, blurred classification boundaries, and overfitting datasets, affecting machine learning models’ accuracy. Many studies have sought to improve model performance through feature selection. However, a single feature selection method has incomplete, unstable, or time-consuming shortcomings. Combining the advantages of various feature selection methods can help overcome these defects. This paper proposes a two-stage feature selection method based on random forest and improved genetic algorithm. First, the importance scores of the random forest are calculated and ranked, and the features are preliminarily eliminated according to the scores, reducing the time complexity of the subsequent process. Then, the improved genetic algorithm is used to search for the global optimal feature subset further. This process introduces a multi-objective fitness function to guide the feature subset, minimizing the number of features in the subset while enhancing classification accuracy. This paper also adds an adaptive mechanism and evolution strategy to improve the loss of population diversity and degeneration in the later stages of iteration, thereby enhancing search efficiency. The experimental results on eight UCI datasets show that the proposed method significantly improves classification performance and has excellent feature selection capability.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-01761-1