VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis

Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient surv...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of computational biology Ročník 32; číslo 9; s. 850
Hlavní autoři: Shi, Xinyuan, Zhu, Fangfang, Min, Wenwen
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 01.09.2025
Témata:
ISSN:1557-8666, 1557-8666
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.
AbstractList Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.
Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.
Author Min, Wenwen
Zhu, Fangfang
Shi, Xinyuan
Author_xml – sequence: 1
  givenname: Xinyuan
  surname: Shi
  fullname: Shi, Xinyuan
  organization: School of Information Science and Engineering, Yunnan University, Kunming, China
– sequence: 2
  givenname: Fangfang
  surname: Zhu
  fullname: Zhu, Fangfang
  organization: School of Health and Nursing, Yunnan Open University, Kunming, China
– sequence: 3
  givenname: Wenwen
  orcidid: 0000-0002-2558-2911
  surname: Min
  fullname: Min, Wenwen
  organization: School of Information Science and Engineering, Yunnan University, Kunming, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40295093$$D View this record in MEDLINE/PubMed
BookMark eNpNkL1PwzAQxS1URD9gZEUeWVJsJ3ZstlDKh9SqDKVrdE4uIiiJi50O_e-JoEjohnvv3k9vuCkZda5DQq45m3OmzV3R2rlgIpkzrZMzMuFSppFWSo3-6TGZhvDJGI8VSy_IOGHCSGbiCdnsth66cE8zusuW0QMELOmbx2g4192gf-LK-RY9XWP_4Uo6OLquC-9s7Vqkj9ADzTpojqEOl-S8gibg1WnPyPvTcrt4iVab59dFtooKIXUfAeoCuCo1QqJ5KqvK2gITYMiVlQVWwsSsNMaitRJkKgVoo8GoYVhlKzEjt7-9e---Dhj6vK1DgU0DHbpDyGNulGI6ZWZAb07owbZY5ntft-CP-d8TxDe-Bl-X
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1089/cmb.2024.0884
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
Mathematics
EISSN 1557-8666
ExternalDocumentID 40295093
Genre Journal Article
GroupedDBID ---
0R~
29K
34G
39C
4.4
53G
5GY
ABBKN
ABEFU
ACGFO
ADBBV
AENEX
AI.
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BNQNF
CAG
CGR
COF
CS3
CUY
CVF
D-I
DIK
DU5
EBS
ECM
EIF
EJD
F5P
IAO
IER
IGS
IHR
IM4
ITC
MV1
NPM
NQHIM
O9-
P2P
R.V
RML
RMSOB
RNS
SCNPE
TN5
TR2
UE5
VH1
7X8
J8X
SAUOL
SFC
ID FETCH-LOGICAL-c258t-ae8ca16d8ea48175ffbbce4a0e16b5cef2930d99bebb5a5752a898a969690fbf2
IEDL.DBID 7X8
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001477523800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1557-8666
IngestDate Sat Nov 01 14:26:56 EDT 2025
Wed Sep 03 02:28:26 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords saliency map
Transformer
multihead-co-attention
variational autoencoder
microbiome data
pretraining
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c258t-ae8ca16d8ea48175ffbbce4a0e16b5cef2930d99bebb5a5752a898a969690fbf2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-2558-2911
PMID 40295093
PQID 3196608709
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3196608709
pubmed_primary_40295093
PublicationCentury 2000
PublicationDate 2025-09-01
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of computational biology
PublicationTitleAlternate J Comput Biol
PublicationYear 2025
SSID ssj0013607
Score 2.4435947
Snippet Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 850
SubjectTerms Algorithms
Computational Biology - methods
Deep Learning
Humans
Microbiota - genetics
Neoplasms - genetics
Neoplasms - microbiology
Neoplasms - mortality
Title VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis
URI https://www.ncbi.nlm.nih.gov/pubmed/40295093
https://www.proquest.com/docview/3196608709
Volume 32
WOSCitedRecordID wos001477523800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1NS8MwGA7qFPTgx_yaX0TwGte1aZd4kakbHuzcYY7dSpImsMO6uU7Bf--btFUvguClpIdAeHk_nuRNngehq9BygBtOCaOBfZKjIsI1N6QVak9G1EjqGPhGT-1-n43HfFAeuOXltcoqJ7pEnc6UPSNvWleJPPAufjt_JVY1ynZXSwmNVVQLAMrYwGyPf3QRIvdcGkomZGLA6SXHpsd4U00lbA59eg1RRn9Hl67K9Hb-u75dtF3iS9wpHGIPreisjjYKxcmPOtqKv2ha8330PHKl6gZ38KjTJXdQ0VI8WGgytMIRMB5WsFYvcOy0pjH84XhS0DdNNX4QS4ErZpMD9NLrDu8fSamwQJQfsiURminRilKmBWUAJIyRUmkqPN2KZKi0ATDgpZxLLWUoANn5gnEmLKMO94w0_iFay2aZPkaYUk9KZgKqAkVFYLihJgoETQ0PhWqLBrqs7JaAB9u2hMj07C1Pvi3XQEeF8ZN5QbWRwO6WA6QJTv4w-xRt-lac110AO0M1A_Grz9G6el9O8sWFcw349gfxJ6PJwtw
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=VTrans%3A+A+VAE-Based+Pre-Trained+Transformer+Method+for+Microbiome+Data+Analysis&rft.jtitle=Journal+of+computational+biology&rft.au=Shi%2C+Xinyuan&rft.au=Zhu%2C+Fangfang&rft.au=Min%2C+Wenwen&rft.date=2025-09-01&rft.issn=1557-8666&rft.eissn=1557-8666&rft.volume=32&rft.issue=9&rft.spage=850&rft_id=info:doi/10.1089%2Fcmb.2024.0884&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-8666&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-8666&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-8666&client=summon