An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data

•Classification of microarray data plays a significant role in the diagnosis of cancer.•Feature selection is necessary for better analysis due to its high-dimensionality.•An efficient multivariate feature selection method is proposed for microarray data.•We demonstrate its usefulness of high accurac...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 166; s. 113971
Hlavní autoři: Lee, Junghye, Choi, In Young, Jun, Chi-Hyuck
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Elsevier Ltd 15.03.2021
Elsevier BV
Témata:
ISSN:0957-4174, 1873-6793
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Classification of microarray data plays a significant role in the diagnosis of cancer.•Feature selection is necessary for better analysis due to its high-dimensionality.•An efficient multivariate feature selection method is proposed for microarray data.•We demonstrate its usefulness of high accuracy and good efficiency using real data.•The method outperforms other comparable gene selection methods in terms of accuracy. Classification of microarray data plays a significant role in the diagnosis and prediction of cancer. However, its high-dimensionality (>tens of thousands) compared to the number of observations (<tens of hundreds) may lead to poor classification accuracy. In addition, only a fraction of genes is really important for the classification of a certain cancer, and thus feature selection is very essential in this field. Due to the time and memory burden for processing the high-dimensional data, univariate feature ranking methods are widely-used in gene selection. However, most of them are not that accurate because they only consider the relevance of features to the target without considering the redundancy among features. In this study, we propose a novel multivariate feature ranking method to improve the quality of gene selection and ultimately to improve the accuracy of microarray data classification. The method can be efficiently applied to high-dimensional microarray data. We embedded the formal definition of relevance into a Markov blanket (MB) to create a new feature ranking method. Using a few microarray datasets, we demonstrated the practicability of MB-based feature ranking having high accuracy and good efficiency. The method outperformed commonly-used univariate ranking methods and also yielded the better result even compared with the other multivariate feature ranking method due to the advantage of data efficiency.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2020.113971