A scalable distributed workflow for accelerating long reads self-correction

Third-Generation Sequencing (TGS) technologies have transformed genomic research by enabling the extraction of longer nucleotide sequences (referred to as long reads) and providing deeper insights into genome structure. However, long reads are often associated with high sequencing error rates, makin...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Future generation computer systems Ročník 177; s. 108244
Hlavní autoři: Ceccaroni, Riccardo, Di Rocco, Lorenzo, Ferraro Petrillo, Umberto, Brutti, Pierpaolo
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.04.2026
Témata:
ISSN:0167-739X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Third-Generation Sequencing (TGS) technologies have transformed genomic research by enabling the extraction of longer nucleotide sequences (referred to as long reads) and providing deeper insights into genome structure. However, long reads are often associated with high sequencing error rates, making their correction a major challenge in many computational genomic pipelines. In this paper, we introduce HyperC, a distributed workflow designed to accelerate the execution of existing long-read self-correction tools through a hybrid parallelization strategy. By combining MPI and OpenMP, our proposal efficiently scatters and executes tasks across a distributed computing system. Optimized input data handling further reduces I/O bottlenecks and maximizes resource utilization. To assess the effectiveness of HyperC, we integrated it with CONSENT, a high-performance correction module, and conducted extensive experiments on real-world sequencing datasets. The results show significant reductions in execution time and improved scalability compared to the standalone execution of CONSENT, establishing HyperCas a robust and practical solution for high-performance genomic analysis and population-scale studies.
ISSN:0167-739X
DOI:10.1016/j.future.2025.108244