A scalable distributed workflow for accelerating long reads self-correction
Third-Generation Sequencing (TGS) technologies have transformed genomic research by enabling the extraction of longer nucleotide sequences (referred to as long reads) and providing deeper insights into genome structure. However, long reads are often associated with high sequencing error rates, makin...
Saved in:
| Published in: | Future generation computer systems Vol. 177; p. 108244 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.04.2026
|
| Subjects: | |
| ISSN: | 0167-739X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Third-Generation Sequencing (TGS) technologies have transformed genomic research by enabling the extraction of longer nucleotide sequences (referred to as long reads) and providing deeper insights into genome structure. However, long reads are often associated with high sequencing error rates, making their correction a major challenge in many computational genomic pipelines. In this paper, we introduce HyperC, a distributed workflow designed to accelerate the execution of existing long-read self-correction tools through a hybrid parallelization strategy. By combining MPI and OpenMP, our proposal efficiently scatters and executes tasks across a distributed computing system. Optimized input data handling further reduces I/O bottlenecks and maximizes resource utilization. To assess the effectiveness of HyperC, we integrated it with CONSENT, a high-performance correction module, and conducted extensive experiments on real-world sequencing datasets. The results show significant reductions in execution time and improved scalability compared to the standalone execution of CONSENT, establishing HyperCas a robust and practical solution for high-performance genomic analysis and population-scale studies. |
|---|---|
| ISSN: | 0167-739X |
| DOI: | 10.1016/j.future.2025.108244 |