Comparing normalization methods and the impact of noise
Introduction Failure to properly account for normal systematic variations in OMICS datasets may result in misleading biological conclusions. Accordingly, normalization is a necessary step in the proper preprocessing of OMICS datasets. In this regards, an optimal normalization method will effectively...
Saved in:
| Published in: | Metabolomics Vol. 14; no. 8; pp. 108 - 10 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.08.2018
Springer Nature B.V |
| Subjects: | |
| ISSN: | 1573-3882, 1573-3890, 1573-3890 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Introduction
Failure to properly account for normal systematic variations in OMICS datasets may result in misleading biological conclusions. Accordingly, normalization is a necessary step in the proper preprocessing of OMICS datasets. In this regards, an optimal normalization method will effectively reduce unwanted biases and increase the accuracy of downstream quantitative analyses. But, it is currently unclear which normalization method is best since each algorithm addresses systematic noise in different ways.
Objective
Determine an optimal choice of a normalization method for the preprocessing of metabolomics datasets.
Methods
Nine MVAPACK normalization algorithms were compared with simulated and experimental NMR spectra modified with added Gaussian noise and random dilution factors. Methods were evaluated based on an ability to recover the intensities of the true spectral peaks and the reproducibility of true classifying features from orthogonal projections to latent structures—discriminant analysis model (OPLS-DA).
Results
Most normalization methods (except histogram matching) performed equally well at modest levels of signal variance. Only probabilistic quotient (PQ) and constant sum (CS) maintained the highest level of peak recovery (> 67%) and correlation with true loadings (> 0.6) at maximal noise.
Conclusion
PQ and CS performed the best at recovering peak intensities and reproducing the true classifying features for an OPLS-DA model regardless of spectral noise level. Our findings suggest that performance is largely determined by the level of noise in the dataset, while the effect of dilution factors was negligible. A minimal allowable noise level of 20% was also identified for a valid NMR metabolomics dataset. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 Authors Contribution TV and ER performed the experiments; RP and YQ designed the experiments; TV, ER, YQ, and RP analyzed the data and wrote the manuscript. |
| ISSN: | 1573-3882 1573-3890 1573-3890 |
| DOI: | 10.1007/s11306-018-1400-6 |