VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis
Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient surv...
Uloženo v:
| Vydáno v: | Journal of computational biology Ročník 32; číslo 9; s. 850 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
01.09.2025
|
| Témata: | |
| ISSN: | 1557-8666, 1557-8666 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580. |
|---|---|
| AbstractList | Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580. Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580. |
| Author | Min, Wenwen Zhu, Fangfang Shi, Xinyuan |
| Author_xml | – sequence: 1 givenname: Xinyuan surname: Shi fullname: Shi, Xinyuan organization: School of Information Science and Engineering, Yunnan University, Kunming, China – sequence: 2 givenname: Fangfang surname: Zhu fullname: Zhu, Fangfang organization: School of Health and Nursing, Yunnan Open University, Kunming, China – sequence: 3 givenname: Wenwen orcidid: 0000-0002-2558-2911 surname: Min fullname: Min, Wenwen organization: School of Information Science and Engineering, Yunnan University, Kunming, China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40295093$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkL1PwzAQxS1URD9gZEUeWVJsJ3ZstlDKh9SqDKVrdE4uIiiJi50O_e-JoEjohnvv3k9vuCkZda5DQq45m3OmzV3R2rlgIpkzrZMzMuFSppFWSo3-6TGZhvDJGI8VSy_IOGHCSGbiCdnsth66cE8zusuW0QMELOmbx2g4192gf-LK-RY9XWP_4Uo6OLquC-9s7Vqkj9ADzTpojqEOl-S8gibg1WnPyPvTcrt4iVab59dFtooKIXUfAeoCuCo1QqJ5KqvK2gITYMiVlQVWwsSsNMaitRJkKgVoo8GoYVhlKzEjt7-9e---Dhj6vK1DgU0DHbpDyGNulGI6ZWZAb07owbZY5ntft-CP-d8TxDe-Bl-X |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1089/cmb.2024.0884 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Biology Mathematics |
| EISSN | 1557-8666 |
| ExternalDocumentID | 40295093 |
| Genre | Journal Article |
| GroupedDBID | --- 0R~ 29K 34G 39C 4.4 53G 5GY ABBKN ABEFU ACGFO ADBBV AENEX AI. ALMA_UNASSIGNED_HOLDINGS BAWUL BNQNF CAG CGR COF CS3 CUY CVF D-I DIK DU5 EBS ECM EIF EJD F5P IAO IER IGS IHR IM4 ITC MV1 NPM NQHIM O9- P2P R.V RML RMSOB RNS SCNPE TN5 TR2 UE5 VH1 7X8 J8X SAUOL SFC |
| ID | FETCH-LOGICAL-c258t-ae8ca16d8ea48175ffbbce4a0e16b5cef2930d99bebb5a5752a898a969690fbf2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001477523800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1557-8666 |
| IngestDate | Sat Nov 01 14:26:56 EDT 2025 Wed Sep 03 02:28:26 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Keywords | saliency map Transformer multihead-co-attention variational autoencoder microbiome data pretraining |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c258t-ae8ca16d8ea48175ffbbce4a0e16b5cef2930d99bebb5a5752a898a969690fbf2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0002-2558-2911 |
| PMID | 40295093 |
| PQID | 3196608709 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_3196608709 pubmed_primary_40295093 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-09-01 |
| PublicationDateYYYYMMDD | 2025-09-01 |
| PublicationDate_xml | – month: 09 year: 2025 text: 2025-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Journal of computational biology |
| PublicationTitleAlternate | J Comput Biol |
| PublicationYear | 2025 |
| SSID | ssj0013607 |
| Score | 2.4435947 |
| Snippet | Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 850 |
| SubjectTerms | Algorithms Computational Biology - methods Deep Learning Humans Microbiota - genetics Neoplasms - genetics Neoplasms - microbiology Neoplasms - mortality |
| Title | VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/40295093 https://www.proquest.com/docview/3196608709 |
| Volume | 32 |
| WOSCitedRecordID | wos001477523800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1NS8MwGA7qFPTgx_yaX0TwGte1aZd4kakbHuzcYY7dSpImsMO6uU7Bf--btFUvguClpIdAeHk_nuRNngehq9BygBtOCaOBfZKjIsI1N6QVak9G1EjqGPhGT-1-n43HfFAeuOXltcoqJ7pEnc6UPSNvWleJPPAufjt_JVY1ynZXSwmNVVQLAMrYwGyPf3QRIvdcGkomZGLA6SXHpsd4U00lbA59eg1RRn9Hl67K9Hb-u75dtF3iS9wpHGIPreisjjYKxcmPOtqKv2ha8330PHKl6gZ38KjTJXdQ0VI8WGgytMIRMB5WsFYvcOy0pjH84XhS0DdNNX4QS4ErZpMD9NLrDu8fSamwQJQfsiURminRilKmBWUAJIyRUmkqPN2KZKi0ATDgpZxLLWUoANn5gnEmLKMO94w0_iFay2aZPkaYUk9KZgKqAkVFYLihJgoETQ0PhWqLBrqs7JaAB9u2hMj07C1Pvi3XQEeF8ZN5QbWRwO6WA6QJTv4w-xRt-lac110AO0M1A_Grz9G6el9O8sWFcw349gfxJ6PJwtw |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=VTrans%3A+A+VAE-Based+Pre-Trained+Transformer+Method+for+Microbiome+Data+Analysis&rft.jtitle=Journal+of+computational+biology&rft.au=Shi%2C+Xinyuan&rft.au=Zhu%2C+Fangfang&rft.au=Min%2C+Wenwen&rft.date=2025-09-01&rft.issn=1557-8666&rft.eissn=1557-8666&rft.volume=32&rft.issue=9&rft.spage=850&rft_id=info:doi/10.1089%2Fcmb.2024.0884&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-8666&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-8666&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-8666&client=summon |