Large-scale classification of metagenomic samples: a comparative analysis of classical machine learning techniques vs a novel brain-inspired hyperdimensional computing approach
Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affectin...
Saved in:
| Published in: | bioRxiv |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article Paper |
| Language: | English |
| Published: |
United States
Cold Spring Harbor Laboratory
07.07.2025
|
| Edition: | 1.1 |
| Subjects: | |
| ISSN: | 2692-8205, 2692-8205 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affecting models accuracy. To address this problem, we explore hyperdimensional computing (HDC), an emerging brain-inspired computational paradigm that leverages high-dimensional vectors and simple arithmetic operations to represent and manipulate complex patterns, as an alternative approach in the context of supervised machine learning. In this work, we present a comprehensive comparative analysis of HDC against established machine learning techniques across a range of classification tasks. As a representative use case, we focus on classifying heterogeneous metagenomic samples based on their quantitative microbial profiles, using publicly available microbiome datasets. Our results demonstrate that HDC achieves comparable, and in some cases, superior classification accuracy to classical methods. Furthermore, our findings highlight the potential of HDC for improved computational efficiency, particularly when dealing with large-scale datasets, suggesting the HDC-based classifier as a promising tool for bioinformatics research, particularly in areas characterized by high-dimensional data. We also offer a Galaxy powered toolset to analyze your own datasets and generate reproducible workflows and adopt these methods in your own research with ease. Our investigation into the application of a HDC-based supervised machine learning technique for classifying microbial profiles in metagenomic samples yielded promising results, demonstrating the potential of this novel computational paradigm to complement and, in some cases, surpass the performances of well established machine learning techniques. |
|---|---|
| AbstractList | Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affecting models accuracy. To address this problem, we explore hyperdimensional computing (HDC), an emerging brain-inspired computational paradigm that leverages high-dimensional vectors and simple arithmetic operations to represent and manipulate complex patterns, as an alternative approach in the context of supervised machine learning. In this work, we present a comprehensive comparative analysis of HDC against established machine learning techniques across a range of classification tasks. As a representative use case, we focus on classifying heterogeneous metagenomic samples based on their quantitative microbial profiles, using publicly available microbiome datasets. Our results demonstrate that HDC achieves comparable, and in some cases, superior classification accuracy to classical methods. Furthermore, our findings highlight the potential of HDC for improved computational efficiency, particularly when dealing with large-scale datasets, suggesting the HDC-based classifier as a promising tool for bioinformatics research, particularly in areas characterized by high-dimensional data. We also offer a Galaxy powered toolset to analyze your own datasets and generate reproducible workflows and adopt these methods in your own research with ease. Our investigation into the application of a HDC-based supervised machine learning technique for classifying microbial profiles in metagenomic samples yielded promising results, demonstrating the potential of this novel computational paradigm to complement and, in some cases, surpass the performances of well established machine learning techniques.Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affecting models accuracy. To address this problem, we explore hyperdimensional computing (HDC), an emerging brain-inspired computational paradigm that leverages high-dimensional vectors and simple arithmetic operations to represent and manipulate complex patterns, as an alternative approach in the context of supervised machine learning. In this work, we present a comprehensive comparative analysis of HDC against established machine learning techniques across a range of classification tasks. As a representative use case, we focus on classifying heterogeneous metagenomic samples based on their quantitative microbial profiles, using publicly available microbiome datasets. Our results demonstrate that HDC achieves comparable, and in some cases, superior classification accuracy to classical methods. Furthermore, our findings highlight the potential of HDC for improved computational efficiency, particularly when dealing with large-scale datasets, suggesting the HDC-based classifier as a promising tool for bioinformatics research, particularly in areas characterized by high-dimensional data. We also offer a Galaxy powered toolset to analyze your own datasets and generate reproducible workflows and adopt these methods in your own research with ease. Our investigation into the application of a HDC-based supervised machine learning technique for classifying microbial profiles in metagenomic samples yielded promising results, demonstrating the potential of this novel computational paradigm to complement and, in some cases, surpass the performances of well established machine learning techniques.The growing complexity and dimensionality of biological data require more efficient and scalable machine learning approaches. HDC offers a novel alternative to conventional methods, showing resilience to high-dimensionality while maintaining competitive accuracy. This study demonstrates the effectiveness of HDC in classifying metagenomic samples based on their microbial composition. Our results suggest that HDC not only matches, but sometimes exceeds the performance of well-established methods. We make this approach accessible to the broader bioinformatics community with an open-source tool fully integrated into the Galaxy platform, facilitating its adoption and reproducibility, with the aim of integrating HDC into mainstream biological data analysis pipelines, especially for complex, high-dimensional tasks in microbiome research.ImportanceThe growing complexity and dimensionality of biological data require more efficient and scalable machine learning approaches. HDC offers a novel alternative to conventional methods, showing resilience to high-dimensionality while maintaining competitive accuracy. This study demonstrates the effectiveness of HDC in classifying metagenomic samples based on their microbial composition. Our results suggest that HDC not only matches, but sometimes exceeds the performance of well-established methods. We make this approach accessible to the broader bioinformatics community with an open-source tool fully integrated into the Galaxy platform, facilitating its adoption and reproducibility, with the aim of integrating HDC into mainstream biological data analysis pipelines, especially for complex, high-dimensional tasks in microbiome research. Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affecting models accuracy. To address this problem, we explore hyperdimensional computing (HDC), an emerging brain-inspired computational paradigm that leverages high-dimensional vectors and simple arithmetic operations to represent and manipulate complex patterns, as an alternative approach in the context of supervised machine learning. In this work, we present a comprehensive comparative analysis of HDC against established machine learning techniques across a range of classification tasks. As a representative use case, we focus on classifying heterogeneous metagenomic samples based on their quantitative microbial profiles, using publicly available microbiome datasets. Our results demonstrate that HDC achieves comparable, and in some cases, superior classification accuracy to classical methods. Furthermore, our findings highlight the potential of HDC for improved computational efficiency, particularly when dealing with large-scale datasets, suggesting the HDC-based classifier as a promising tool for bioinformatics research, particularly in areas characterized by high-dimensional data. We also offer a Galaxy powered toolset to analyze your own datasets and generate reproducible workflows and adopt these methods in your own research with ease. Our investigation into the application of a HDC-based supervised machine learning technique for classifying microbial profiles in metagenomic samples yielded promising results, demonstrating the potential of this novel computational paradigm to complement and, in some cases, surpass the performances of well established machine learning techniques. Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affecting models accuracy. To address this problem, we explore hyperdimensional computing (HDC), an emerging brain-inspired computational paradigm that leverages high-dimensional vectors and simple arithmetic operations to represent and manipulate complex patterns, as an alternative approach in the context of supervised machine learning. In this work, we present a comprehensive comparative analysis of HDC against established machine learning techniques across a range of classification tasks. As a representative use case, we focus on classifying heterogeneous metagenomic samples based on their quantitative microbial profiles, using publicly available microbiome datasets. Our results demonstrate that HDC achieves comparable, and in some cases, superior classification accuracy to classical methods. Furthermore, our findings highlight the potential of HDC for improved computational efficiency, particularly when dealing with large-scale datasets, suggesting the HDC-based classifier as a promising tool for bioinformatics research, particularly in areas characterized by high-dimensional data. We also offer a Galaxy powered toolset to analyze your own datasets and generate reproducible workflows and adopt these methods in your own research with ease. Our investigation into the application of a HDC-based supervised machine learning technique for classifying microbial profiles in metagenomic samples yielded promising results, demonstrating the potential of this novel computational paradigm to complement and, in some cases, surpass the performances of well established machine learning techniques. The growing complexity and dimensionality of biological data require more efficient and scalable machine learning approaches. HDC offers a novel alternative to conventional methods, showing resilience to high-dimensionality while maintaining competitive accuracy. This study demonstrates the effectiveness of HDC in classifying metagenomic samples based on their microbial composition. Our results suggest that HDC not only matches, but sometimes exceeds the performance of well-established methods. We make this approach accessible to the broader bioinformatics community with an open-source tool fully integrated into the Galaxy platform, facilitating its adoption and reproducibility, with the aim of integrating HDC into mainstream biological data analysis pipelines, especially for complex, high-dimensional tasks in microbiome research. |
| Author | Blankenberg, Daniel Cumbo, Fabio Joshi, Jayadev |
| Author_xml | – sequence: 1 givenname: Jayadev orcidid: 0000-0001-7589-5230 surname: Joshi fullname: Joshi, Jayadev organization: Center for Computational Life Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA – sequence: 2 givenname: Fabio orcidid: 0000-0003-2920-5838 surname: Cumbo fullname: Cumbo, Fabio organization: Center for Computational Life Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA – sequence: 3 givenname: Daniel orcidid: 0000-0002-6833-9049 surname: Blankenberg fullname: Blankenberg, Daniel organization: Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40672168$$D View this record in MEDLINE/PubMed |
| BookMark | eNpVkcuO1DAQRSM0iHkwH8AGeckmjR8dO2GD0AgGpJbYwNoqO5VuI8cOdjqa_is-EUcNw7CqkurWqWvf6-oixIBV9YrRDWOUveWUNxuqNlRupBSi2z6rrrjseN1y2lw86S-r25x_UEp5J5lQ2xfV5ZZKxZlsr6pfO0h7rLMFj8R6yNkNzsLsYiBxICPOsMcQR2dJhnHymN8RIDaOE6SiWpBAAH_KLq_yM6CwyAj24AISj5CCC3syoz0E9_OImSy5IEJc0BOTwIXahTy5hD05nCZMvRsx5GKgYNZDx3ndh2lKsUBfVs8H8Blv_9Sb6vunj9_uPte7r_df7j7sasPadltb0ysqpFJ8kGgVF6213DbQgQFrac8Mlm6LjJkGbG9aI1GovqFD09Oh68RN9f7MnY5mxN5imBN4PSU3QjrpCE7_PwnuoPdx0Yxz2ZSDhfDmTDAupge3PO6uyWmqNJX6nNw_aXnj-kWzHl226D0EjMesBRe0Y1zx1dfrp74eoX8jFb8BJXGsCQ |
| Cites_doi | 10.3390/nu13082638 10.1186/s40168-018-0531-3 10.7554/eLife.65088 10.1038/s41467-020-15457-9 10.1038/nbt.3960 10.1016/j.cels.2016.10.004 10.1038/ncomms7528 10.1038/nature18927 10.1136/fmch-2019-000262 10.1093/nar/gkac247 10.1093/bib/bbaf177 10.1016/j.chom.2018.06.005 10.1038/nature17672 10.1038/nmeth.4468 10.1038/s41467-018-07019-x 10.1038/s41467-017-02018-w 10.1093/nar/gkae410 10.1016/j.neuroimage.2019.06.023 10.1038/ismej.2016.37 10.1007/s41745-023-00370-z 10.1038/nature11450 10.1007/978-1-0716-2986-4_10 10.3390/a13090233 10.1038/s44220-023-00145-6 10.1146/annurev.genet.38.072902.091216 10.1186/s12859-022-04727-6 10.3389/fcimb.2024.1429197 10.1038/s41591-019-0458-7 10.1109/MCAS.2020.2988388 10.1038/s41467-017-00900-1 10.1016/j.chom.2018.06.007 10.1016/j.cell.2016.04.007 10.1136/gutjnl-2015-309800 10.1111/jgh.15502 10.1038/s41591-019-0406-6 10.21105/joss.05704 10.1186/s12934-022-01973-4 10.1038/s41591-020-0963-8 10.1038/s41522-020-00155-7 10.1016/j.cell.2016.10.020 10.1038/nbt.2939 10.1038/nbt.2942 10.7717/peerj-cs.2885 10.1038/nature23889 10.1016/j.csbj.2021.04.054 10.1038/s43705-022-00182-9 10.15252/msb.20145645 10.1016/j.nutres.2017.04.003 10.1038/nrg.2017.63 10.1093/gigascience/giad083 10.1093/gigascience/giz042 10.3390/foods12112140 10.1186/s13059-020-02020-4 10.1128/mBio.00434-20 10.1038/s41591-022-01695-5 10.1038/s41586-019-1560-1 10.1038/bjc.2015.465 10.1016/j.beth.2020.05.002 10.1186/s40168-017-0261-y 10.1111/jcpe.12087 10.1093/nar/gkab1019 10.1186/s12575-022-00179-7 10.1101/gr.4086505 10.1017/gmb.2023.14 10.1038/srep34826 10.1145/3558000 10.1371/journal.pcbi.1012426 10.1128/genomeA.00890-16 10.3389/fmicb.2023.1257002 |
| ContentType | Journal Article Paper |
| Copyright | 2025, Posted by Cold Spring Harbor Laboratory |
| Copyright_xml | – notice: 2025, Posted by Cold Spring Harbor Laboratory |
| DBID | NPM 7X8 FX. 5PM |
| DOI | 10.1101/2025.07.06.663394 |
| DatabaseName | PubMed MEDLINE - Academic bioRxiv PubMed Central (Full Participant titles) |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 2692-8205 |
| Edition | 1.1 |
| ExternalDocumentID | PMC12265723 2025.07.06.663394v1 40672168 |
| Genre | Journal Article Preprint |
| GrantInformation_xml | – fundername: NCI NIH HHS grantid: U24 CA231877 – fundername: NHGRI NIH HHS grantid: U24 HG006620 |
| GroupedDBID | 8FE 8FH AFKRA ALMA_UNASSIGNED_HOLDINGS BBNVY BENPR BHPHI CCPQU HCIFZ LK8 M7P NPM NQS PHGZM PHGZT PIMPY PQGLB PROAC RHI 7X8 PUEGO FX. 5PM AFFHD |
| ID | FETCH-LOGICAL-b1884-cbd7036772f6ec7238cc2c5a9abacc0d1beaba4e11b5acdb8b6e37d50f5d0f993 |
| ISSN | 2692-8205 |
| IngestDate | Tue Nov 04 02:03:28 EST 2025 Sat Jul 12 18:20:13 EDT 2025 Fri Sep 05 15:40:33 EDT 2025 Sat Aug 02 01:40:58 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| License | This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0 This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-b1884-cbd7036772f6ec7238cc2c5a9abacc0d1beaba4e11b5acdb8b6e37d50f5d0f993 |
| Notes | ObjectType-Working Paper/Pre-Print-3 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interest Statement: Daniel Blankenberg has a significant financial interest in GalaxyWorks, a company that may have a commercial interest in the results of this research and technology. This potential conflict of interest has been reviewed and is managed by the Cleveland Clinic. |
| ORCID | 0000-0002-6833-9049 0000-0003-2920-5838 0000-0001-7589-5230 |
| OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC12265723 |
| PMID | 40672168 |
| PQID | 3230912729 |
| PQPubID | 23479 |
| PageCount | 32 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_12265723 biorxiv_primary_2025_07_06_663394 proquest_miscellaneous_3230912729 pubmed_primary_40672168 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-Jul-07 20250707 |
| PublicationDateYYYYMMDD | 2025-07-07 |
| PublicationDate_xml | – month: 07 year: 2025 text: 2025-Jul-07 day: 07 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | bioRxiv |
| PublicationTitleAlternate | bioRxiv |
| PublicationYear | 2025 |
| Publisher | Cold Spring Harbor Laboratory |
| Publisher_xml | – name: Cold Spring Harbor Laboratory |
| References | Topçuoğlu, Lesniak, Ruffin, Wiens, Schloss (2025.07.06.663394v1.68) 2020; 11 Dai, Zhu, Sun, Li, Liu, Wu (2025.07.06.663394v1.6) 2022; 50 Cumbo, Chicco (2025.07.06.663394v1.19) 2025; 11 Feng, Liang, Jia, Stadlmayr, Tang, Lan (2025.07.06.663394v1.54) 2015; 6 Li, Luo, Ji, Nielsen (2025.07.06.663394v1.11) 2022; 21 Nam, Do, Loan Trinh, Lee (2025.07.06.663394v1.1) 2023; 12 Wu, Peters, Dominianni, Zhang, Pei, Yang (2025.07.06.663394v1.61) 2016; 10 Ge, Parhi (2025.07.06.663394v1.20) 2020; 20 Rubel, Abbas, Taylor, Connell, Tanes, Bittinger (2025.07.06.663394v1.36) 2020; 21 Vangay, Hillmann, Knights (2025.07.06.663394v1.17) 2019; 8 Hernández Medina, Kutuzova, Nielsen, Johansen, Hansen, Nielsen (2025.07.06.663394v1.15) 2022; 2 Stock, Van Criekinge, Boeckaerts, Taelman, Van Haeverbeke, Dewulf (2025.07.06.663394v1.18) 2024; 20 Liu, Zhang, Wu, Cai, Huang, Chen (2025.07.06.663394v1.42) 2016; 6 Shanahan, Shah, Koloski, Walker, Talley, Morrison (2025.07.06.663394v1.63) 2018; 6 Yu, Feng, Wong, Zhang, Liang, Qin (2025.07.06.663394v1.31) 2017; 66 Lewis, Lewis (2025.07.06.663394v1.8) 2016; 4 Shao, Forster, Tsaliki, Vervier, Strang, Simpson (2025.07.06.663394v1.50) 2019; 574 Ke, Wang, Ratanatharathorn, Huang, Roberts, Grodstein (2025.07.06.663394v1.56) 2023; 1 (2025.07.06.663394v1.27) 2016; 3 Barber, Mego, Sabater, Vallejo, Bendezu, Masihy (2025.07.06.663394v1.59) 2021; 13 Riesenfeld, Schloss, Handelsman (2025.07.06.663394v1.2) 2004; 38 Chowdhury, Turin (2025.07.06.663394v1.26) 2020; 8 Joshi, Blankenberg (2025.07.06.663394v1.66) 2022; 23 Kim (2025.07.06.663394v1.4) 2023; 2629 Yachida, Mizutani, Shiroma, Shiba, Nakajima, Sakamoto (2025.07.06.663394v1.55) 2019; 25 Qin, Li, Cai, Li, Zhu, Zhang (2025.07.06.663394v1.34) 2012; 490 Notting, Pirovano, Sybesma, Kort (2025.07.06.663394v1.57) 2023; 4 Lloyd-Price, Mahurkar, Rahnavard, Crabtree, Orvis, Hall (2025.07.06.663394v1.7) 2017; 550 Jie, Xia, Zhong, Feng, Li, Liang (2025.07.06.663394v1.35) 2017; 8 Zeller, Tap, Voigt, Sunagawa, Kultima, Costea (2025.07.06.663394v1.53) 2014 Giardine, Riemer, Hardison, Burhans, Elnitski, Shah (2025.07.06.663394v1.65) 2005; 15 D’Elia, Truu, Lahti, Berland, Papoutsoglou, Ceci (2025.07.06.663394v1.16) 2023; 14 Lee, Cappellato, Di Camillo (2025.07.06.663394v1.69) 2022; 12 Zhu, Ju, Wang, Wang, Guo, Ma (2025.07.06.663394v1.33) 2020; 11 (2025.07.06.663394v1.40) 2018; 24 (2025.07.06.663394v1.64) 2022; 50 (2025.07.06.663394v1.58) 2017; 41 Hansen, Roager, Søndertoft, Gøbel, Kristensen, Vallès-Colomer (2025.07.06.663394v1.39) 2018; 9 Beghini, McIver, Blanco-Míguez, Dubois, Asnicar, Maharjan (2025.07.06.663394v1.23) 2021; 10 Hall, Tolonen, Xavier (2025.07.06.663394v1.47) 2017; 18 Lee, Thomas, Bolte, Björk, de Ruijter, Armanini (2025.07.06.663394v1.41) 2022; 28 Kleyko, Rachkovskij, Osipov, Rahimi (2025.07.06.663394v1.22) 2023; 55 Pehrsson, Tsukayama, Patel, Mejía-Bautista, Sosa-Soto, Navarrete (2025.07.06.663394v1.48) 2016; 533 Wu, Chen, Li, Li, Zhao, Su (2025.07.06.663394v1.13) 2021; 19 Cumbo, Cappelli, Weitschek (2025.07.06.663394v1.24) 2020; 13 Navgire, Goel, Sawhney, Sharma, Kaushik, Mohanta (2025.07.06.663394v1.3) 2022; 24 Pasolli, Schiffer, Manghi, Renson, Obenchain, Truong (2025.07.06.663394v1.10) 2017; 14 Keohane, Ghosh, Jeffery, Molloy, O’Toole, Shanahan (2025.07.06.663394v1.28) 2020; 26 Ghensi, Manghi, Zolfo, Armanini, Pasolli, Bolzan (2025.07.06.663394v1.29) 2020; 6 Sengupta, Sivabalan, Mahesh, Palanikumar, Kuppa Baskaran, Raman (2025.07.06.663394v1.9) 2023 Cumbo, Truglia, Weitschek, Blankenberg (2025.07.06.663394v1.25) 2025; 26 (2025.07.06.663394v1.52) 2016; 165 Cheung, Yu (2025.07.06.663394v1.5) 2021; 36 Brooks, Olm, Firek, Baker, Thomas, Morowitz (2025.07.06.663394v1.51) 2017; 8 (2025.07.06.663394v1.62) 2019; 200 Cumbo, Weitschek (2025.07.06.663394v1.70) 2023; 8 Chen, Wu, Ye, Li (2025.07.06.663394v1.14) 2024; 14 Bizzarro, Loos, Laine, Crielaard, Zaura (2025.07.06.663394v1.60) 2013; 40 Community (2025.07.06.663394v1.67) 2024; 52 Vogtmann, Goedert (2025.07.06.663394v1.30) 2016; 114 (2025.07.06.663394v1.43) 2016; 167 (2025.07.06.663394v1.45) 2018; 24 Li, Jia, Cai, Zhong, Feng, Sunagawa (2025.07.06.663394v1.46) 2014; 32 Aygun, Moghadam, Najafi, Imani (2025.07.06.663394v1.21) 2023 Nielsen, Almeida, Juncker, Rasmussen, Li, Sunagawa (2025.07.06.663394v1.32) 2014; 32 Nagy-Szakal, Williams, Mishra, Che, Lee, Bateman (2025.07.06.663394v1.38) 2017; 5 Brito, Yilmaz, Huang, Xu, Jupiter, Jenkins (2025.07.06.663394v1.44) 2016; 535 Costea, Zeller, Sunagawa, Pelletier, Alberti, Levenez (2025.07.06.663394v1.49) 2017; 35 Wirbel, Pyl, Kartal, Zych, Kashani, Milanese (2025.07.06.663394v1.37) 2019; 25 Jiang, Gradus, Rosellini (2025.07.06.663394v1.12) 2020; 51 |
| References_xml | – volume: 13 start-page: 2638 year: 2021 ident: 2025.07.06.663394v1.59 article-title: Differential Effects of Western and Mediterranean-Type Diets on Gut Microbiota: A Metagenomics and Metabolomics Approach publication-title: Nutrients doi: 10.3390/nu13082638 – volume: 6 start-page: 1 year: 2018 end-page: 12 ident: 2025.07.06.663394v1.63 article-title: Influence of cigarette smoking on the human duodenal mucosa-associated microbiota publication-title: Microbiome doi: 10.1186/s40168-018-0531-3 – volume: 10 year: 2021 ident: 2025.07.06.663394v1.23 article-title: Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 publication-title: Elife doi: 10.7554/eLife.65088 – volume: 11 start-page: 1 year: 2020 end-page: 10 ident: 2025.07.06.663394v1.33 article-title: Metagenome-wide association of gut microbiome features for schizophrenia publication-title: Nature Communications doi: 10.1038/s41467-020-15457-9 – volume: 35 start-page: 1069 year: 2017 end-page: 1076 ident: 2025.07.06.663394v1.49 article-title: Towards standards for human fecal sample processing in metagenomic studies publication-title: Nature Biotechnology doi: 10.1038/nbt.3960 – volume: 3 start-page: 572 year: 2016 end-page: 584.e3 ident: 2025.07.06.663394v1.27 publication-title: Cell Systems doi: 10.1016/j.cels.2016.10.004 – volume: 6 start-page: 1 year: 2015 end-page: 13 ident: 2025.07.06.663394v1.54 article-title: Gut microbiome development along the colorectal adenoma–carcinoma sequence publication-title: Nature Communications doi: 10.1038/ncomms7528 – volume: 535 start-page: 435 year: 2016 end-page: 439 ident: 2025.07.06.663394v1.44 article-title: Mobile genes in the human microbiome are structured from global to individual scales publication-title: Nature doi: 10.1038/nature18927 – volume: 8 start-page: e000262 year: 2020 ident: 2025.07.06.663394v1.26 article-title: Variable selection strategies and its importance in clinical prediction modelling publication-title: Fam Med Community Health doi: 10.1136/fmch-2019-000262 – volume: 50 start-page: W345 year: 2022 end-page: W351 ident: 2025.07.06.663394v1.64 article-title: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update publication-title: Nucleic Acids Res doi: 10.1093/nar/gkac247 – volume: 26 year: 2025 ident: 2025.07.06.663394v1.25 article-title: Feature selection with vector-symbolic architectures: a case study on microbial profiles of shotgun metagenomic samples of colorectal cancer publication-title: Briefings in Bioinformatics doi: 10.1093/bib/bbaf177 – volume: 24 start-page: 133 year: 2018 end-page: 145.e5 ident: 2025.07.06.663394v1.45 publication-title: Cell Host & Microbe doi: 10.1016/j.chom.2018.06.005 – volume: 533 start-page: 212 year: 2016 end-page: 216 ident: 2025.07.06.663394v1.48 article-title: Interconnected microbiomes and resistomes in low-income human habitats publication-title: Nature doi: 10.1038/nature17672 – volume: 14 start-page: 1023 year: 2017 end-page: 1024 ident: 2025.07.06.663394v1.10 article-title: Accessible, curated metagenomic data through ExperimentHub publication-title: Nat Methods doi: 10.1038/nmeth.4468 – volume: 9 start-page: 1 year: 2018 end-page: 13 ident: 2025.07.06.663394v1.39 article-title: A low-gluten diet induces changes in the intestinal microbiome of healthy Danish adults publication-title: Nature Communications doi: 10.1038/s41467-018-07019-x – volume: 8 start-page: 1 year: 2017 end-page: 7 ident: 2025.07.06.663394v1.51 article-title: Strain-resolved analysis of hospital rooms and infants reveals overlap between the human and room microbiome publication-title: Nature Communications doi: 10.1038/s41467-017-02018-w – volume: 52 start-page: W83 year: 2024 end-page: W94 ident: 2025.07.06.663394v1.67 article-title: The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update publication-title: Nucleic Acids Res doi: 10.1093/nar/gkae410 – volume: 200 start-page: 121 year: 2019 end-page: 131 ident: 2025.07.06.663394v1.62 publication-title: NeuroImage doi: 10.1016/j.neuroimage.2019.06.023 – volume: 10 start-page: 2435 year: 2016 end-page: 2446 ident: 2025.07.06.663394v1.61 article-title: Cigarette smoking and the oral microbiome in a large study of American adults publication-title: ISME J doi: 10.1038/ismej.2016.37 – start-page: 1 year: 2023 end-page: 17 ident: 2025.07.06.663394v1.9 article-title: Big Data for a Small World: A Review on Databases and Resources for Studying Microbiomes publication-title: J Indian Inst Sci doi: 10.1007/s41745-023-00370-z – volume: 490 start-page: 55 year: 2012 end-page: 60 ident: 2025.07.06.663394v1.34 article-title: A metagenome-wide association study of gut microbiota in type 2 diabetes publication-title: Nature doi: 10.1038/nature11450 – volume: 2629 start-page: 183 year: 2023 end-page: 229 ident: 2025.07.06.663394v1.4 article-title: Bioinformatic and Statistical Analysis of Microbiome Data publication-title: Methods Mol Biol doi: 10.1007/978-1-0716-2986-4_10 – volume: 13 issue: 233 year: 2020 ident: 2025.07.06.663394v1.24 article-title: A Brain-Inspired Hyperdimensional Computing Approach for Classifying Massive DNA Methylation Data of Cancer publication-title: Algorithms doi: 10.3390/a13090233 – volume: 1 start-page: 900 year: 2023 end-page: 913 ident: 2025.07.06.663394v1.56 article-title: Association of probable post-traumatic stress disorder with dietary pattern and gut microbiome in a cohort of women publication-title: Nature Mental Health doi: 10.1038/s44220-023-00145-6 – volume: 38 start-page: 525 year: 2004 end-page: 552 ident: 2025.07.06.663394v1.2 article-title: Metagenomics: genomic analysis of microbial communities publication-title: Annu Rev Genet doi: 10.1146/annurev.genet.38.072902.091216 – volume: 23 issue: 197 year: 2022 ident: 2025.07.06.663394v1.66 article-title: PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling publication-title: BMC Bioinformatics doi: 10.1186/s12859-022-04727-6 – volume: 14 issue: 1429197 year: 2024 ident: 2025.07.06.663394v1.14 article-title: Editorial: Machine learning and deep learning applications in pathogenic microbiome research publication-title: Front Cell Infect Microbiol doi: 10.3389/fcimb.2024.1429197 – volume: 25 start-page: 968 year: 2019 end-page: 976 ident: 2025.07.06.663394v1.55 article-title: Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer publication-title: Nature Medicine doi: 10.1038/s41591-019-0458-7 – volume: 20 start-page: 30 year: 2020 end-page: 47 ident: 2025.07.06.663394v1.20 article-title: Classification Using Hyperdimensional Computing: A Review. IEEE Circuits and Systems Magazine publication-title: . Secondquarter doi: 10.1109/MCAS.2020.2988388 – volume: 8 start-page: 1 year: 2017 end-page: 12 ident: 2025.07.06.663394v1.35 article-title: The gut microbiome in atherosclerotic cardiovascular disease publication-title: Nature Communications doi: 10.1038/s41467-017-00900-1 – volume: 24 start-page: 146 year: 2018 end-page: 154.e4 ident: 2025.07.06.663394v1.40 publication-title: Cell Host & Microbe doi: 10.1016/j.chom.2018.06.007 – volume: 165 start-page: 842 year: 2016 end-page: 853 ident: 2025.07.06.663394v1.52 publication-title: Cell doi: 10.1016/j.cell.2016.04.007 – volume: 66 start-page: 70 year: 2017 end-page: 78 ident: 2025.07.06.663394v1.31 article-title: Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer publication-title: Gut doi: 10.1136/gutjnl-2015-309800 – volume: 36 start-page: 817 year: 2021 end-page: 822 ident: 2025.07.06.663394v1.5 article-title: Machine learning on microbiome research in gastrointestinal cancer publication-title: J Gastroenterol Hepatol doi: 10.1111/jgh.15502 – volume: 25 start-page: 679 year: 2019 end-page: 689 ident: 2025.07.06.663394v1.37 article-title: Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer publication-title: Nature Medicine doi: 10.1038/s41591-019-0406-6 – volume: 8 start-page: 5704 year: 2023 ident: 2025.07.06.663394v1.70 article-title: Blankenberg D. hdlib: A Python library for designing Vector-Symbolic Architectures publication-title: J Open Source Softw doi: 10.21105/joss.05704 – volume: 21 issue: 241 year: 2022 ident: 2025.07.06.663394v1.11 article-title: Machine learning for data integration in human gut microbiome publication-title: Microb Cell Fact doi: 10.1186/s12934-022-01973-4 – volume: 26 start-page: 1089 year: 2020 end-page: 1095 ident: 2025.07.06.663394v1.28 article-title: Microbiome and health implications for ethnic minorities after enforced lifestyle changes publication-title: Nature Medicine doi: 10.1038/s41591-020-0963-8 – year: 2023 ident: 2025.07.06.663394v1.21 article-title: Learning from Hypervectors: A Survey on Hypervector Encoding publication-title: arxiv – volume: 6 start-page: 1 year: 2020 end-page: 12 ident: 2025.07.06.663394v1.29 article-title: Strong oral plaque microbiome signatures for dental implant diseases identified by strain-resolution metagenomics publication-title: npj Biofilms and Microbiomes doi: 10.1038/s41522-020-00155-7 – volume: 167 start-page: 1125 year: 2016 end-page: 1136.e8 ident: 2025.07.06.663394v1.43 publication-title: Cell doi: 10.1016/j.cell.2016.10.020 – volume: 32 start-page: 822 year: 2014 end-page: 828 ident: 2025.07.06.663394v1.32 article-title: Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes publication-title: Nature Biotechnology doi: 10.1038/nbt.2939 – volume: 32 start-page: 834 year: 2014 end-page: 841 ident: 2025.07.06.663394v1.46 article-title: An integrated catalog of reference genes in the human gut microbiome publication-title: Nature Biotechnology doi: 10.1038/nbt.2942 – volume: 11 start-page: e2885 year: 2025 ident: 2025.07.06.663394v1.19 article-title: Hyperdimensional computing in biomedical sciences: a brief review publication-title: PeerJ Comput Sci doi: 10.7717/peerj-cs.2885 – volume: 550 start-page: 61 year: 2017 end-page: 66 ident: 2025.07.06.663394v1.7 article-title: Strains, functions and dynamics in the expanded Human Microbiome Project publication-title: Nature doi: 10.1038/nature23889 – volume: 19 start-page: 2742 year: 2021 end-page: 2749 ident: 2025.07.06.663394v1.13 article-title: Towards multi-label classification: Next step of machine learning for microbiome research publication-title: Comput Struct Biotechnol J doi: 10.1016/j.csbj.2021.04.054 – volume: 2 issue: 98 year: 2022 ident: 2025.07.06.663394v1.15 article-title: Machine learning and deep learning applications in microbiome research publication-title: ISME Commun doi: 10.1038/s43705-022-00182-9 – year: 2014 ident: 2025.07.06.663394v1.53 article-title: Potential of fecal microbiota for early-stage detection of colorectal cancer publication-title: Molecular Systems Biology doi: 10.15252/msb.20145645 – volume: 41 start-page: 86 year: 2017 end-page: 96 ident: 2025.07.06.663394v1.58 publication-title: Nutrition Research doi: 10.1016/j.nutres.2017.04.003 – volume: 18 start-page: 690 year: 2017 end-page: 699 ident: 2025.07.06.663394v1.47 article-title: Human genetic variation and the gut microbiome in disease publication-title: Nature Reviews Genetics doi: 10.1038/nrg.2017.63 – volume: 12 year: 2022 ident: 2025.07.06.663394v1.69 article-title: Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease publication-title: Gigascience doi: 10.1093/gigascience/giad083 – volume: 8 year: 2019 ident: 2025.07.06.663394v1.17 article-title: Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks publication-title: Gigascience doi: 10.1093/gigascience/giz042 – volume: 12 year: 2023 ident: 2025.07.06.663394v1.1 article-title: Metagenomics: An Effective Approach for Exploring Microbial Diversity and Functions publication-title: Foods doi: 10.3390/foods12112140 – volume: 21 start-page: 1 year: 2020 end-page: 32 ident: 2025.07.06.663394v1.36 article-title: Lifestyle and the presence of helminths is associated with gut microbiome composition in Cameroonians publication-title: Genome Biology doi: 10.1186/s13059-020-02020-4 – volume: 11 year: 2020 ident: 2025.07.06.663394v1.68 article-title: A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems publication-title: . mBio doi: 10.1128/mBio.00434-20 – volume: 28 start-page: 535 year: 2022 end-page: 544 ident: 2025.07.06.663394v1.41 article-title: Cross-cohort gut microbiome associations with immune checkpoint inhibitor response in advanced melanoma publication-title: Nature Medicine doi: 10.1038/s41591-022-01695-5 – volume: 574 start-page: 117 year: 2019 end-page: 121 ident: 2025.07.06.663394v1.50 article-title: Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth publication-title: Nature doi: 10.1038/s41586-019-1560-1 – volume: 114 start-page: 237 year: 2016 end-page: 242 ident: 2025.07.06.663394v1.30 article-title: Epidemiologic studies of the human microbiome and cancer publication-title: British Journal of Cancer doi: 10.1038/bjc.2015.465 – volume: 51 start-page: 675 year: 2020 end-page: 687 ident: 2025.07.06.663394v1.12 article-title: Supervised Machine Learning: A Brief Primer publication-title: Behav Ther doi: 10.1016/j.beth.2020.05.002 – volume: 5 start-page: 1 year: 2017 end-page: 17 ident: 2025.07.06.663394v1.38 article-title: Fecal metagenomic profiles in subgroups of patients with myalgic encephalomyelitis/chronic fatigue syndrome publication-title: Microbiome doi: 10.1186/s40168-017-0261-y – volume: 40 start-page: 483 year: 2013 end-page: 492 ident: 2025.07.06.663394v1.60 article-title: Subgingival microbiome in smokers and non-smokers in periodontitis: an exploratory study using traditional targeted techniques and a next-generation sequencing publication-title: Journal of Clinical Periodontology doi: 10.1111/jcpe.12087 – volume: 50 start-page: D777 year: 2022 end-page: D784 ident: 2025.07.06.663394v1.6 article-title: GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison publication-title: Nucleic Acids Res doi: 10.1093/nar/gkab1019 – volume: 24 issue: 18 year: 2022 ident: 2025.07.06.663394v1.3 article-title: Analysis and Interpretation of metagenomics data: an approach publication-title: Biol Proced Online doi: 10.1186/s12575-022-00179-7 – volume: 15 start-page: 1451 year: 2005 end-page: 1455 ident: 2025.07.06.663394v1.65 article-title: Galaxy: a platform for interactive large-scale genome analysis publication-title: Genome Res doi: 10.1101/gr.4086505 – volume: 4 start-page: e16 year: 2023 ident: 2025.07.06.663394v1.57 article-title: The butyrate-producing and spore-forming bacterial genus Coprococcus as a potential biomarker for neurological disorders publication-title: Gut Microbiome doi: 10.1017/gmb.2023.14 – volume: 6 start-page: 1 year: 2016 end-page: 13 ident: 2025.07.06.663394v1.42 article-title: Unique Features of Ethnic Mongolian Gut Microbiome revealed by metagenomic analysis publication-title: Scientific Reports doi: 10.1038/srep34826 – volume: 55 start-page: 1 year: 2023 end-page: 52 ident: 2025.07.06.663394v1.22 article-title: A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part II: Applications, Cognitive Models, and Challenges publication-title: ACM Computing Surveys doi: 10.1145/3558000 – volume: 20 start-page: e1012426 year: 2024 ident: 2025.07.06.663394v1.18 article-title: Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological data publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1012426 – volume: 4 year: 2016 ident: 2025.07.06.663394v1.8 article-title: A New Catalog of Microbiological Tools for Women’s Infectious Disease Research publication-title: Genome Announc doi: 10.1128/genomeA.00890-16 – volume: 14 issue: 1257002 year: 2023 ident: 2025.07.06.663394v1.16 article-title: Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action publication-title: Front Microbiol doi: 10.3389/fmicb.2023.1257002 |
| SSID | ssj0002961374 |
| Score | 1.9140456 |
| SecondaryResourceType | preprint |
| Snippet | Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However,... |
| SourceID | pubmedcentral biorxiv proquest pubmed |
| SourceType | Open Access Repository Aggregation Database Index Database |
| SubjectTerms | Microbiology |
| Title | Large-scale classification of metagenomic samples: a comparative analysis of classical machine learning techniques vs a novel brain-inspired hyperdimensional computing approach |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/40672168 https://www.proquest.com/docview/3230912729 https://www.biorxiv.org/content/10.1101/2025.07.06.663394 https://pubmed.ncbi.nlm.nih.gov/PMC12265723 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Biological Science Database customDbUrl: eissn: 2692-8205 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002961374 issn: 2692-8205 databaseCode: M7P dateStart: 20131107 isFulltext: true titleUrlDefault: http://search.proquest.com/biologicalscijournals providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2692-8205 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002961374 issn: 2692-8205 databaseCode: BENPR dateStart: 20131107 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2692-8205 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002961374 issn: 2692-8205 databaseCode: PIMPY dateStart: 20131107 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELa6XUB7Qbwpj8pI3KJAkyZxwg2qXQHqVtFqkcopsh2HrWiTqmmj7l_ixE9kxnk0LXsAJA6NUjd1pHyfnRl75htCXvsy8LjArX-ROKbj28qET2LCu1okbuIxKbWI65hNJv50GoSdzo86F6aYszT1t9tg-V-hhjYAG1Nn_wLuplNogHMAHY4AOxz_CPgxxnabOTx7ZUi0jTEYqDEMF2rNUZcVQ-JzjtLAeZnvLFsy4LylVFJ2gUgudNylqgtNfDMa_dfcKHLoIs0KNTcEFp0wZylu4YM1ewWOLpBwgYHyetVR6joSOjey0jNvG8hill1sZ8UusCfXVYeNz_yax6ppH20WQi_ynnFRRpLpNYU5T7_vQtbK9Pn2uobt6hhYtgsvyuZgcOvFTcxnggFhjMthkVXx0UrPkrYXwJRu69ztG94Hug4Bdq8lWr03YGANy6rKLSosF5oLDm5LW2WVnwMR7vB8ZIGh6jJ7eESObeYGfpccfzidhBfNip4dgGnEnGrrHO789rf7npA79U3A34Lns4InepNvcxii27J5Lu-Ru5WzQt-XJLtPOip9QG6X5UuvH5KfLarRfarRLKEtqtGKau8opy2i0ZpoeHlDNFoRjdZEozui0SKHLjTR6D7R6CHRaEM0WhPtEflydno5-mhWFUBMYfm-Y0oRo0AceICJpyTWx4OpQ7o84IJLOYgtoeDMUZYlXC5j4QtPDVnsDhI3HiRgej8m3TRL1VNCHQHOcwzuu6tsRwYyELFrOwlDAT8Wx16PvKrwiJalzkuE8EUDFg28qIQPrqmRimAWxq01nqpsk0dD8OQDywZPtUeelMg13dSI94i_h2lzASq87_-Szq600nvNuWf__tfn5GQ3ul6Q7nq1US_JLVmsZ_mqT47Y1O9XTO5j4HMI38JP5-HXX1JR3k0 |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-scale+classification+of+metagenomic+samples%3A+a+comparative+analysis+of+classical+machine+learning+techniques+vs+a+novel+brain-inspired+hyperdimensional+computing+approach&rft.jtitle=bioRxiv&rft.au=Joshi%2C+Jayadev&rft.au=Cumbo%2C+Fabio&rft.au=Blankenberg%2C+Daniel&rft.date=2025-07-07&rft.pub=Cold+Spring+Harbor+Laboratory&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2025.07.06.663394&rft_id=info%3Apmid%2F40672168&rft.externalDocID=PMC12265723 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon |