Large-Scale Meta-Heuristic Feature Selection Based on BPSO Assisted Rough Hypercuboid Approach
The selection of prominent features for building more compact and efficient models is an important data preprocessing task in the field of data mining. The rough hypercuboid approach is an emerging technique that can be applied to eliminate irrelevant and redundant features, especially for the inexa...
Saved in:
| Published in: | IEEE transaction on neural networks and learning systems Vol. 34; no. 12; pp. 10889 - 10903 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
IEEE
01.12.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2162-237X, 2162-2388, 2162-2388 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The selection of prominent features for building more compact and efficient models is an important data preprocessing task in the field of data mining. The rough hypercuboid approach is an emerging technique that can be applied to eliminate irrelevant and redundant features, especially for the inexactness problem in approximate numerical classification. By integrating the meta-heuristic-based evolutionary search technique, a novel global search method for numerical feature selection is proposed in this article based on the hybridization of the rough hypercuboid approach and binary particle swarm optimization (BPSO) algorithm, namely RH-BPSO. To further alleviate the issue of high computational cost when processing large-scale datasets, parallelization approaches for calculating the hybrid feature evaluation criteria are presented by decomposing and recombining hypercuboid equivalence partition matrix via horizontal data partitioning. A distributed meta-heuristic optimized rough hypercuboid feature selection (DiRH-BPSO) algorithm is thus developed and embedded in the Apache Spark cloud computing model. Extensive experimental results indicate that RH-BPSO is promising and can significantly outperform the other representative feature selection algorithms in terms of classification accuracy, the cardinality of the selected feature subset, and execution efficiency. Moreover, experiments on distributed-memory multicore clusters show that DiRH-BPSO is significantly faster than its sequential counterpart and is perfectly capable of completing large-scale feature selection tasks that fail on a single node due to memory constraints. Parallel scalability and extensibility analysis also demonstrate that DiRH-BPSO could scale out and extend well with the growth of computational nodes and the volume of data. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2162-237X 2162-2388 2162-2388 |
| DOI: | 10.1109/TNNLS.2022.3171614 |