Two data pre-processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics
Uloženo v:
| Název: | Two data pre-processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics |
|---|---|
| Autoři: | Féraud, Baptiste, Leenders, Justine, Martineau, Estelle, Giraudeau, Patrick, Govaerts, Bernadette, de Tullio, Pascal |
| Přispěvatelé: | Institut de Statistique, Biostatistique et Sciences Actuarielles (ISBA), Université Catholique de Louvain = Catholic University of Louvain (UCL), Université de Liège = University of Liège = Universiteit van Luik = Universität Lüttich (ULiège), Chimie Et Interdisciplinarité : Synthèse, Analyse, Modélisation (CEISAM), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-Institut de Chimie - CNRS Chimie (INC-CNRS)-Centre National de la Recherche Scientifique (CNRS) |
| Zdroj: | ISSN: 1573-3882. |
| Informace o vydavateli: | CCSD Springer Verlag |
| Rok vydání: | 2019 |
| Sbírka: | Université de Nantes: HAL-UNIV-NANTES |
| Témata: | 1H-NMR, 2D NMR, Biomarker discovery, COSY spectra, L-sOPLS, Metabolomic informative content (MIC), PLS, Pre-prossessing workflows, sPLS, MESH: Algorithms Biomarkers Computational Biology / methods* Data Analysis Magnetic Resonance Imaging / methods Magnetic Resonance Spectroscopy / methods* Metabolomics / methods* Software Workflow, [CHIM]Chemical Sciences |
| Popis: | International audience ; Introduction: The pre-processing of analytical data in metabolomics must be considered as a whole to allow the construction of a global and unique object for any further simultaneous data analysis or multivariate statistical modelling. For 1D 1H-NMR metabolomics experiments, best practices for data pre-processing are well defined, but not yet for 2D experiments (for instance COSY in this paper).Objective: By considering the added value of a second dimension, the objective is to propose two workflows dedicated to 2D NMR data handling and preparation (the Global Peak List and Vectorization approaches) and to compare them (with respect to each other and with 1D standards). This will allow to detect which methodology is the best in terms of amount of metabolomic content and to explore the advantages of the selected workflow in distinguishing among treatment groups and identifying relevant biomarkers. Therefore, this paper explores both the necessity of novel 2D pre-processing workflows, the evaluation of their quality and the evaluation of their performance in the subsequent determination of accurate (2D) biomarkers.Methods: To select the more informative data source, MIC (Metabolomic Informative Content) indexes are used, based on clustering and inertia measures of quality. Then, to highlight biomarkers or critical spectral zones, the PLS-DA model is used, along with more advanced sparse algorithms (sPLS and L-sOPLS).Results: Results are discussed according to two different experimental designs (one which is unsupervised and based on human urine samples, and the other which is controlled and based on spiked serum media). MIC indexes are shown, leading to the choice of the more relevant workflow to use thereafter. Finally, biomarkers are provided for each case and the predictive power of each candidate model is assessed with cross-validated measures of RMSEP.Conclusion: In conclusion, it is shown that no solution can be universally the best in every case, but that 2D experiments allow to ... |
| Druh dokumentu: | article in journal/newspaper |
| Jazyk: | English |
| Relation: | info:eu-repo/semantics/altIdentifier/pmid/30993405; PUBMED: 30993405 |
| DOI: | 10.1007/s11306-019-1524-3 |
| Dostupnost: | https://hal.science/hal-03447217 https://hal.science/hal-03447217v1/document https://hal.science/hal-03447217v1/file/view https://doi.org/10.1007/s11306-019-1524-3 |
| Rights: | info:eu-repo/semantics/OpenAccess |
| Přístupové číslo: | edsbas.E20EE6BB |
| Databáze: | BASE |
| Abstrakt: | International audience ; Introduction: The pre-processing of analytical data in metabolomics must be considered as a whole to allow the construction of a global and unique object for any further simultaneous data analysis or multivariate statistical modelling. For 1D 1H-NMR metabolomics experiments, best practices for data pre-processing are well defined, but not yet for 2D experiments (for instance COSY in this paper).Objective: By considering the added value of a second dimension, the objective is to propose two workflows dedicated to 2D NMR data handling and preparation (the Global Peak List and Vectorization approaches) and to compare them (with respect to each other and with 1D standards). This will allow to detect which methodology is the best in terms of amount of metabolomic content and to explore the advantages of the selected workflow in distinguishing among treatment groups and identifying relevant biomarkers. Therefore, this paper explores both the necessity of novel 2D pre-processing workflows, the evaluation of their quality and the evaluation of their performance in the subsequent determination of accurate (2D) biomarkers.Methods: To select the more informative data source, MIC (Metabolomic Informative Content) indexes are used, based on clustering and inertia measures of quality. Then, to highlight biomarkers or critical spectral zones, the PLS-DA model is used, along with more advanced sparse algorithms (sPLS and L-sOPLS).Results: Results are discussed according to two different experimental designs (one which is unsupervised and based on human urine samples, and the other which is controlled and based on spiked serum media). MIC indexes are shown, leading to the choice of the more relevant workflow to use thereafter. Finally, biomarkers are provided for each case and the predictive power of each candidate model is assessed with cross-validated measures of RMSEP.Conclusion: In conclusion, it is shown that no solution can be universally the best in every case, but that 2D experiments allow to ... |
|---|---|
| DOI: | 10.1007/s11306-019-1524-3 |
Nájsť tento článok vo Web of Science