Analyzing the performance of short-read classification tools on metagenomic samples toward proper diagnosis of diseases

Accurate knowledge of the genome, virus and bacteria that have invaded our bodies is crucial for diagnosing many human diseases. The field of bioinformatics encompasses the complex computational methods required for this purpose. Metagenomics employs next-generation sequencing (NGS) technology to st...

Full description

Saved in:
Bibliographic Details
Published in:Journal of bioinformatics and computational biology Vol. 22; no. 5; p. 2450012
Main Authors: Irankhah, Leili, Khorsand, Babak, Naghibzadeh, Mahmoud, Savadi, Abdorreza
Format: Journal Article
Language:English
Published: Singapore 01.10.2024
Subjects:
ISSN:1757-6334, 1757-6334
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate knowledge of the genome, virus and bacteria that have invaded our bodies is crucial for diagnosing many human diseases. The field of bioinformatics encompasses the complex computational methods required for this purpose. Metagenomics employs next-generation sequencing (NGS) technology to study and identify microbial communities in environmental samples. This technique allows for the measurement of the relative abundance of different microbes. Various tools are available for detecting bacterial species in sequenced metagenomic samples. In this study, we focus on well-known taxonomic classification tools such as MetaPhlAn4, Centrifuge, Kraken2, and Bracken, and evaluate their performance at the species level using synthetic and real datasets. The results indicate that MetaPhlAn4 exhibited high precision in identifying species in the simulated dataset, while Kraken2 had the best area under the precision-recall curve (AUPR) performance. Centrifuge, Kraken2, and Bracken showed accurate estimation of species abundances, unlike MetaPhlAn4, which had a higher L2 distance. In the real dataset analysis with samples from an inflammatory bowel disease (IBD) research, MetaPhlAn4, and Kraken2 had faster execution times, with differences in performance at family and species levels among the tools. and were highlighted as the most abundant families by Centrifuge, Kraken2, and MetaPhlAn4, with variations in abundance among ulcerative colitis (UC), Crohn's disease (CD), and control non-IBD (CN) groups. ( ) has the highest abundance among species in the CD and UC groups in comparison with the CN group. Bracken overestimated abundance, emphasizing result interpretation caution. The findings of this research can assist in selecting the appropriate short-read classifier, thereby aiding in the diagnosis of target diseases.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1757-6334
1757-6334
DOI:10.1142/S0219720024500124