A meta-heuristic feature selection algorithm combining random sampling accelerator and ensemble using data perturbation

Meta-heuristic algorithms have been extensively utilized in feature selection tasks because they can obtain the global optimal solution. However, the meta-heuristic algorithm will take too much time in the face of a large number of samples. Although most of the studies compromise to approximate opti...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Applied intelligence (Dordrecht, Netherlands) Ročník 53; číslo 24; s. 29781 - 29798
Hlavní autori:	Zhang, Shuaishuai, Liu, Keyu, Xu, Taihua, Yang, Xibei, Zhang, Ao
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York Springer US 01.12.2023 Springer Nature B.V
Predmet:	Algorithms Artificial Intelligence Classification Computer Science Datasets Feature selection Heuristic Heuristic methods Machines Manufacturing Mechanical Engineering Perturbation Processes Random sampling Stability Feature selection Neighborhood rough set Meta-heuristic Random sampling Data perturbation
ISSN:	0924-669X, 1573-7497
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Meta-heuristic algorithms have been extensively utilized in feature selection tasks because they can obtain the global optimal solution. However, the meta-heuristic algorithm will take too much time in the face of a large number of samples. Although most of the studies compromise to approximate optimal solutions for avoiding time-consuming problems, a new problem with reduced classification performance, especially classification stability, is then generated. Aiming to above problems, this paper proposes a new feature selection framework. First, this framework exploits a voting ensemble strategy to improve classification stability by reducing the impact of misclassified labels on the overall classification results. Second, the framework uses a data perturbation strategy to enhance classification accuracy. In particular, the data perturbation strategy is able to generate more neighborhood relationships in the dataset, which could reveal the distribution of various features of the samples. A voting ensemble of different feature distributions is capable of extracting more information from the dataset, then the initially misclassified samples are more likely to be returned to the correct classification. Third, the framework takes a random sampling accelerator into account to solve the problem of excessive time consumption by reducing the size of the search sample space. Finally, for the sake of verifying the effectiveness of the proposed framework, four meta-heuristic feature selection methods based on a neighborhood rough set are compared on 20 datasets. The experimental results indicate that our framework could improve classification performance and accelerate feature selection, particularly in confronting large sample sizes.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-023-05123-0