Incremental feature selection: Parallel approach with local neighborhood rough sets and composite entropy

Rough set theory is a powerful mathematical framework for managing uncertainty and is widely utilized in feature selection. However, traditional rough set-based feature selection algorithms encounter significant challenges, especially when processing large-scale incremental data and adapting to the...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition Vol. 159; p. 111141
Main Authors:	Xu, Weihua, Ye, Weirui
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 01.03.2025
Subjects:	Composite entropy Feature selection Incremental algorithm Rough set Incremental algorithm Feature selection Composite entropy Rough set
ISSN:	0031-3203
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Rough set theory is a powerful mathematical framework for managing uncertainty and is widely utilized in feature selection. However, traditional rough set-based feature selection algorithms encounter significant challenges, especially when processing large-scale incremental data and adapting to the dynamic nature of real-world scenarios, where both data volume and feature sets are continuously changing. To overcome these limitations, this study proposes an innovative algorithm that integrates local neighborhood rough sets with composite entropy to measure uncertainty in information systems more accurately. By incorporating decision distribution, composite entropy enhances the precision of uncertainty quantification, thereby improving the effectiveness of the algorithm in feature selection. To further improve performance in handling large-scale incremental data, matrix operations are employed in place of traditional set-based methods, allowing the algorithm to fully utilize modern hardware capabilities for accelerated processing. Additionally, parallel computing technology is integrated to further enhance computational speed. An incremental version of the algorithm is also introduced to better adapt to dynamic data environments, increasing its flexibility and practicality. Comprehensive experimental evaluations demonstrate that the proposed algorithm significantly surpasses existing methods in both effectiveness and efficiency. •The algorithmic paradigm has been innovatively shifted from traditional set operations to matrix and logical computations.•We integrate parallel computing technology into our algorithms by breaking down relatively independent tasks.•This approach avoids issues such as data inconsistency and deadlocks, thereby significantly improving the runtime.•The results demonstrate that our algorithms outperform other methods in terms of both effectiveness and efficiency.
ISSN:	0031-3203
DOI:	10.1016/j.patcog.2024.111141