ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles

Background Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant varia...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:BMC bioinformatics Ročník 21; číslo 1; s. 43 - 14
Hlavní autori: Zhao, Xudong, Jiao, Qing, Li, Hangyu, Wu, Yiming, Wang, Hanxu, Huang, Shan, Wang, Guohua
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: London BioMed Central 05.02.2020
BioMed Central Ltd
Springer Nature B.V
BMC
Predmet:
ISSN:1471-2105, 1471-2105
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Background Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant variables. Multivariate hypothesis testing holds a non-mainstream position, considering the large computation overhead of large-scale matrix operation. Random forest provides a classification strategy for calculation of variable importance. However, it may be unsuitable for different distributions of samples. Results Based on the thought of using an e nsemble c lassifier, we develop a f eature s election tool for d ifferential e xpression a nalysis on expression profiles (i.e., ECFS-DEA for short). Considering the differences in sample distribution, a graphical user interface is designed to allow the selection of different base classifiers. Inspired by random forest, a common measure which is applicable to any base classifier is proposed for calculation of variable importance. After an interactive selection of a feature on sorted individual variables, a projection heatmap is presented using k-means clustering. ROC curve is also provided, both of which can intuitively demonstrate the effectiveness of the selected feature. Conclusions Feature selection through ensemble classifiers helps to select important variables and thus is applicable for different sample distributions. Experiments on simulation and realistic data demonstrate the effectiveness of ECFS-DEA for differential expression analysis on expression profiles. The software is available at http://bio-nefu.com/resource/ecfs-dea .
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-020-3388-y