Supervised Contrastive Learning Leads to More Reasonable Spectral Embeddings

Uloženo v:
Podrobná bibliografie
Název: Supervised Contrastive Learning Leads to More Reasonable Spectral Embeddings
Autoři: Peng Xiong, Hongtao Xu, Haoran Zheng
Rok vydání: 2025
Sbírka: Bath Spa University: Figshare
Témata: Biochemistry, Biotechnology, Mental Health, Space Science, Biological Sciences not elsewhere classified, Chemical Sciences not elsewhere classified, Information Systems not elsewhere classified, transformer encoder architecture, substantially outperforming msbert, reasonable spectral embeddings, quality spectral embeddings, offered new solutions, mtbls1572 data set, mona data set, https :// huggingface, https :// github, gnps training subset, gnps test subset, dimensional vector representations, 1 hit ratio, experimental results show, specembedding significantly enhances, employs replicated spectra, experimental conditions, widely applied, web service, specembedding achieves, source code, recent advancements, publicly available
Popis: Over the past decades, mass spectrometry has served as a fundamental technique for molecular identification in the field of metabolomics, widely applied to the analysis and characterization of biomolecules. However, the complexity of experimental conditions and the structural similarities among compounds pose significant challenges for accurate identification. Recent advancements in deep learning have offered new solutions to address these challenges, particularly demonstrating great potential in generating high-quality spectral embeddings. In this study, we propose a novel method, SpecEmbedding, which leverages a transformer encoder architecture and employs replicated spectra of compounds as positive samples, trained under a supervised contrastive learning framework. By mapping complex mass spectra into low-dimensional vector representations, SpecEmbedding significantly enhances the comparability between spectra, thereby improving identification accuracy. We trained SpecEmbedding on the GNPS training subset and evaluated its performance on the GNPS test subset, the MoNA data set, and the MTBLS1572 data set. Experimental results show that SpecEmbedding achieves a Top-1 hit ratio of 81.73% on the GNPS test subset, substantially outperforming MSBERT (77.81%) and DreaMS (71.90%). The source code for this study is publicly available at https://github.com/sword-nan/SpecEmbedding. A web service is provided at https://huggingface.co/spaces/xp113280/SpecEmbedding.
Druh dokumentu: article in journal/newspaper
Jazyk: unknown
Relation: https://figshare.com/articles/journal_contribution/Supervised_Contrastive_Learning_Leads_to_More_Reasonable_Spectral_Embeddings/30118561
DOI: 10.1021/acs.analchem.5c02655.s001
Dostupnost: https://doi.org/10.1021/acs.analchem.5c02655.s001
https://figshare.com/articles/journal_contribution/Supervised_Contrastive_Learning_Leads_to_More_Reasonable_Spectral_Embeddings/30118561
Rights: CC BY-NC 4.0
Přístupové číslo: edsbas.B34E29D3
Databáze: BASE
Popis
Abstrakt:Over the past decades, mass spectrometry has served as a fundamental technique for molecular identification in the field of metabolomics, widely applied to the analysis and characterization of biomolecules. However, the complexity of experimental conditions and the structural similarities among compounds pose significant challenges for accurate identification. Recent advancements in deep learning have offered new solutions to address these challenges, particularly demonstrating great potential in generating high-quality spectral embeddings. In this study, we propose a novel method, SpecEmbedding, which leverages a transformer encoder architecture and employs replicated spectra of compounds as positive samples, trained under a supervised contrastive learning framework. By mapping complex mass spectra into low-dimensional vector representations, SpecEmbedding significantly enhances the comparability between spectra, thereby improving identification accuracy. We trained SpecEmbedding on the GNPS training subset and evaluated its performance on the GNPS test subset, the MoNA data set, and the MTBLS1572 data set. Experimental results show that SpecEmbedding achieves a Top-1 hit ratio of 81.73% on the GNPS test subset, substantially outperforming MSBERT (77.81%) and DreaMS (71.90%). The source code for this study is publicly available at https://github.com/sword-nan/SpecEmbedding. A web service is provided at https://huggingface.co/spaces/xp113280/SpecEmbedding.
DOI:10.1021/acs.analchem.5c02655.s001