PVGwfa: a multi-level parallel sequence-to-graph alignment algorithm

A pangenome graph represents the genomes of multiple individuals, offering a comprehensive reference and overcoming allele bias from linear reference genomes. Sequence-to-graph alignment, crucial for pangenome tasks, aligns sequences to a graph to find the best matches. However, existing algorithms...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:The Journal of supercomputing Ročník 81; číslo 5; s. 743
Hlavní autoři: Peng, Chenchen, Xia, Zeyu, Tang, Shengbo, Guo, Yifei, Yang, Canqun, Tang, Tao, Cui, Yingbo
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer Nature B.V 15.04.2025
Témata:
ISSN:1573-0484, 0920-8542, 1573-0484
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:A pangenome graph represents the genomes of multiple individuals, offering a comprehensive reference and overcoming allele bias from linear reference genomes. Sequence-to-graph alignment, crucial for pangenome tasks, aligns sequences to a graph to find the best matches. However, existing algorithms struggle with large-scale sequences. In this paper, we propose PVGwfa, a multi-level parallel sequence-to-graph alignment algorithm. We first employ MPI and Pthread for multi-process and multi-thread parallelization. Next, we introduce a hybrid load balancing strategy for better performance. Additionally, we vectorize the core of PVGwfa using SIMD instructions to accelerate sequence alignment. Experiments on real and simulated datasets show that PVGwfa reduces computation time from nearly an hour to a few minutes. For large datasets, PVGwfa achieved speedups ranging from 1.98× to 100.44× as the number of processes increased from 2 to 128, while maintaining consistent alignment results. The PVGwfa tool and source code are publicly available at https://github.com/nudt-bioinfo/PVGwfa.git.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1573-0484
0920-8542
1573-0484
DOI:10.1007/s11227-025-07184-z