Top-k Feature Selection Framework Using Robust 0-1 Integer Programming

Feature selection (FS), which identifies the relevant features in a data set to facilitate subsequent data analysis, is a fundamental problem in machine learning and has been widely studied in recent years. Most FS methods rank the features in order of their scores based on a specific criterion and...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transaction on neural networks and learning systems Ročník 32; číslo 7; s. 3005 - 3019
Hlavní autori:	Zhang, Xiaoqin, Fan, Mingyu, Wang, Di, Zhou, Peng, Tao, Dacheng
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Piscataway IEEE 01.07.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	0-1 integer programming Computer science Constraints Correlation Data analysis Datasets Equivalence Fans Feature extraction Feature selection feature selection (FS) Integer programming Learning algorithms Linear programming Machine learning nonconvex optimization norm Optimization Robustness
ISSN:	2162-237X, 2162-2388, 2162-2388
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Feature selection (FS), which identifies the relevant features in a data set to facilitate subsequent data analysis, is a fundamental problem in machine learning and has been widely studied in recent years. Most FS methods rank the features in order of their scores based on a specific criterion and then select the <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> top-ranked features, where <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> is the number of desired features. However, these features are usually not the top-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> features and may present a suboptimal choice. To address this issue, we propose a novel FS framework in this article to select the exact top-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> features in the unsupervised, semisupervised, and supervised scenarios. The new framework utilizes the <inline-formula> <tex-math notation="LaTeX">\ell _{0,2} </tex-math></inline-formula>-norm as the matrix sparsity constraint rather than its relaxations, such as the <inline-formula> <tex-math notation="LaTeX">\ell _{1,2} </tex-math></inline-formula>-norm. Since the <inline-formula> <tex-math notation="LaTeX">\ell _{0,2} </tex-math></inline-formula>-norm constrained problem is difficult to solve, we transform the discrete <inline-formula> <tex-math notation="LaTeX">\ell _{0,2} </tex-math></inline-formula>-norm-based constraint into an equivalent 0-1 integer constraint and replace the 0-1 integer constraint with two continuous constraints. The obtained top-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> FS framework with two continuous constraints is theoretically equivalent to the <inline-formula> <tex-math notation="LaTeX">\ell _{0,2} </tex-math></inline-formula>-norm constrained problem and can be optimized by the alternating direction method of multipliers (ADMM). Unsupervised and semisupervised FS methods are developed based on the proposed framework, and extensive experiments on real-world data sets are conducted to demonstrate the effectiveness of the proposed FS framework.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2020.3009209