Bibliographic Details
| Title: |
MTBseq-nf: Enabling Scalable Tuberculosis Genomics "Big Data" Analysis Through a User-Friendly Nextflow Wrapper for MTBseq Pipeline. |
| Authors: |
Sharma, Abhinav, Marcon, Davi Josué, Loubser, Johannes, Lima, Karla Valéria Batista, van der Spuy, Gian, Conceição, Emilyn Costa |
| Source: |
Microorganisms; Dec2025, Vol. 13 Issue 12, p2685, 16p |
| Subject Terms: |
TUBERCULOSIS, GENOMICS, DRUG resistance, NUCLEOTIDE sequencing, WORKFLOW management, BIOINFORMATICS, WHOLE genome sequencing, PARALLEL algorithms |
| Abstract: |
The MTBseq pipeline, published in 2018, was designed to address bioinformatics challenges in tuberculosis (TB) research using whole-genome sequencing (WGS) data. It was the first publicly available tool on GitHub to perform full analysis of WGS data for Mycobacterium tuberculosis complex (MTBC) encompassing quality control through mapping, variant calling for lineage classification, drug resistance prediction, and phylogenetic inference. However, the pipeline's architecture is not optimal for analyses on high-performance computing or cloud computing environments that often involve large datasets. To overcome this limitation, we developed MTBseq-nf, a Nextflow wrapper that provides parallelization for faster execution speeds in addition to several other significant enhancements. The MTBseq-nf wrapper can run several instances of the same step in parallel, fully utilizing the available resources, unlike the linear, batched analysis of samples in the TBfull step of the MTBseq pipeline. For evaluation of scalability and reproducibility, we used 90 M. tuberculosis genomes (European Nucleotide Archive—ENA accession PRJEB7727) for the benchmarking analysis on a dedicated computational server. In our benchmarks, MTBseq-nf in its parallel mode is at least twice as fast as the standard MTBseq pipeline for cohorts exceeding 20 samples. Through integration with the best practices of nf-core, Bioconda, and Biocontainers projects MTBseq-nf ensures reproducibility and platform independence, providing a scalable and efficient solution for TB genomic surveillance. [ABSTRACT FROM AUTHOR] |
|
Copyright of Microorganisms is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Database: |
Biomedical Index |