A scalable distributed workflow for accelerating long reads self-correction

Third-Generation Sequencing (TGS) technologies have transformed genomic research by enabling the extraction of longer nucleotide sequences (referred to as long reads) and providing deeper insights into genome structure. However, long reads are often associated with high sequencing error rates, makin...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Future generation computer systems Ročník 177; s. 108244
Hlavní autori: Ceccaroni, Riccardo, Di Rocco, Lorenzo, Ferraro Petrillo, Umberto, Brutti, Pierpaolo
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 01.04.2026
Predmet:
ISSN:0167-739X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Third-Generation Sequencing (TGS) technologies have transformed genomic research by enabling the extraction of longer nucleotide sequences (referred to as long reads) and providing deeper insights into genome structure. However, long reads are often associated with high sequencing error rates, making their correction a major challenge in many computational genomic pipelines. In this paper, we introduce HyperC, a distributed workflow designed to accelerate the execution of existing long-read self-correction tools through a hybrid parallelization strategy. By combining MPI and OpenMP, our proposal efficiently scatters and executes tasks across a distributed computing system. Optimized input data handling further reduces I/O bottlenecks and maximizes resource utilization. To assess the effectiveness of HyperC, we integrated it with CONSENT, a high-performance correction module, and conducted extensive experiments on real-world sequencing datasets. The results show significant reductions in execution time and improved scalability compared to the standalone execution of CONSENT, establishing HyperCas a robust and practical solution for high-performance genomic analysis and population-scale studies.
ISSN:0167-739X
DOI:10.1016/j.future.2025.108244