Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

Background Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate)...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	BMC bioinformatics Ročník 12; číslo 1; s. 390
Hlavní autori:	Önskog, Jenny, Freyhult, Eva, Landfors, Mattias, Rydén, Patrik, Hvidsten, Torgeir R
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	London BioMed Central 07.10.2011 BioMed Central Ltd Springer Nature B.V BMC
Predmet:	Algorithms Artificial Intelligence Bioinformatics Biomedical and Life Sciences Cancer cell Classification Comparative genomics Comparative studies Computational Biology/Bioinformatics Computer Appl. in Life Sciences DNA microarrays expression features Gene expression Gene Expression Profiling Genetic aspects Humans Indexing in process Life Sciences Microarrays Neoplasms - genetics Oligonucleotide Array Sequence Analysis Research Article Software statistical methods Studies Support Vector Machine Synergistic effect tumors Variables Sweden Machine Learning Method Support Vector Machine Gene Selection Background Correction Hide Layer
ISSN:	1471-2105, 1471-2105
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Background Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/1471-2105-12-390