Large-Scale Meta-Heuristic Feature Selection Based on BPSO Assisted Rough Hypercuboid Approach

The selection of prominent features for building more compact and efficient models is an important data preprocessing task in the field of data mining. The rough hypercuboid approach is an emerging technique that can be applied to eliminate irrelevant and redundant features, especially for the inexa...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems Vol. 34; no. 12; pp. 10889 - 10903
Main Authors:	Luo, Chuan, Wang, Sizhao, Li, Tianrui, Chen, Hongmei, Lv, Jiancheng, Yi, Zhang
Format:	Journal Article
Language:	English
Published:	United States IEEE 01.12.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Classification Cloud computing Computational modeling Computer applications Computing costs Data mining Data processing Distributed memory Feature extraction Feature selection Heuristic Heuristic algorithms Heuristic methods Hybridization hypercuboid Mathematical models Matrix partitioning parallel computing Parallel processing Particle swarm optimization Partitioning algorithms Problem solving Rough sets Search methods Sparks
ISSN:	2162-237X, 2162-2388, 2162-2388
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The selection of prominent features for building more compact and efficient models is an important data preprocessing task in the field of data mining. The rough hypercuboid approach is an emerging technique that can be applied to eliminate irrelevant and redundant features, especially for the inexactness problem in approximate numerical classification. By integrating the meta-heuristic-based evolutionary search technique, a novel global search method for numerical feature selection is proposed in this article based on the hybridization of the rough hypercuboid approach and binary particle swarm optimization (BPSO) algorithm, namely RH-BPSO. To further alleviate the issue of high computational cost when processing large-scale datasets, parallelization approaches for calculating the hybrid feature evaluation criteria are presented by decomposing and recombining hypercuboid equivalence partition matrix via horizontal data partitioning. A distributed meta-heuristic optimized rough hypercuboid feature selection (DiRH-BPSO) algorithm is thus developed and embedded in the Apache Spark cloud computing model. Extensive experimental results indicate that RH-BPSO is promising and can significantly outperform the other representative feature selection algorithms in terms of classification accuracy, the cardinality of the selected feature subset, and execution efficiency. Moreover, experiments on distributed-memory multicore clusters show that DiRH-BPSO is significantly faster than its sequential counterpart and is perfectly capable of completing large-scale feature selection tasks that fail on a single node due to memory constraints. Parallel scalability and extensibility analysis also demonstrate that DiRH-BPSO could scale out and extend well with the growth of computational nodes and the volume of data.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2022.3171614