Large-scale classification of metagenomic samples: a comparative analysis of classical machine learning techniques vs a novel brain-inspired hyperdimensional computing approach

Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affectin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:bioRxiv
Hauptverfasser: Joshi, Jayadev, Cumbo, Fabio, Blankenberg, Daniel
Format: Journal Article Paper
Sprache:Englisch
Veröffentlicht: United States Cold Spring Harbor Laboratory 07.07.2025
Ausgabe:1.1
Schlagworte:
ISSN:2692-8205, 2692-8205
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affecting models accuracy. To address this problem, we explore hyperdimensional computing (HDC), an emerging brain-inspired computational paradigm that leverages high-dimensional vectors and simple arithmetic operations to represent and manipulate complex patterns, as an alternative approach in the context of supervised machine learning. In this work, we present a comprehensive comparative analysis of HDC against established machine learning techniques across a range of classification tasks. As a representative use case, we focus on classifying heterogeneous metagenomic samples based on their quantitative microbial profiles, using publicly available microbiome datasets. Our results demonstrate that HDC achieves comparable, and in some cases, superior classification accuracy to classical methods. Furthermore, our findings highlight the potential of HDC for improved computational efficiency, particularly when dealing with large-scale datasets, suggesting the HDC-based classifier as a promising tool for bioinformatics research, particularly in areas characterized by high-dimensional data. We also offer a Galaxy powered toolset to analyze your own datasets and generate reproducible workflows and adopt these methods in your own research with ease. Our investigation into the application of a HDC-based supervised machine learning technique for classifying microbial profiles in metagenomic samples yielded promising results, demonstrating the potential of this novel computational paradigm to complement and, in some cases, surpass the performances of well established machine learning techniques.
Bibliographie:ObjectType-Working Paper/Pre-Print-3
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Competing Interest Statement: Daniel Blankenberg has a significant financial interest in GalaxyWorks, a company that may have a commercial interest in the results of this research and technology. This potential conflict of interest has been reviewed and is managed by the Cleveland Clinic.
ISSN:2692-8205
2692-8205
DOI:10.1101/2025.07.06.663394