ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles

Background Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant varia...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics Vol. 21; no. 1; pp. 43 - 14
Main Authors: Zhao, Xudong, Jiao, Qing, Li, Hangyu, Wu, Yiming, Wang, Hanxu, Huang, Shan, Wang, Guohua
Format: Journal Article
Language:English
Published: London BioMed Central 05.02.2020
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects:
ISSN:1471-2105, 1471-2105
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant variables. Multivariate hypothesis testing holds a non-mainstream position, considering the large computation overhead of large-scale matrix operation. Random forest provides a classification strategy for calculation of variable importance. However, it may be unsuitable for different distributions of samples. Results Based on the thought of using an e nsemble c lassifier, we develop a f eature s election tool for d ifferential e xpression a nalysis on expression profiles (i.e., ECFS-DEA for short). Considering the differences in sample distribution, a graphical user interface is designed to allow the selection of different base classifiers. Inspired by random forest, a common measure which is applicable to any base classifier is proposed for calculation of variable importance. After an interactive selection of a feature on sorted individual variables, a projection heatmap is presented using k-means clustering. ROC curve is also provided, both of which can intuitively demonstrate the effectiveness of the selected feature. Conclusions Feature selection through ensemble classifiers helps to select important variables and thus is applicable for different sample distributions. Experiments on simulation and realistic data demonstrate the effectiveness of ECFS-DEA for differential expression analysis on expression profiles. The software is available at http://bio-nefu.com/resource/ecfs-dea .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-020-3388-y