PVGwfa: a multi-level parallel sequence-to-graph alignment algorithm

A pangenome graph represents the genomes of multiple individuals, offering a comprehensive reference and overcoming allele bias from linear reference genomes. Sequence-to-graph alignment, crucial for pangenome tasks, aligns sequences to a graph to find the best matches. However, existing algorithms...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:The Journal of supercomputing Ročník 81; číslo 5; s. 743
Hlavní autori: Peng, Chenchen, Xia, Zeyu, Tang, Shengbo, Guo, Yifei, Yang, Canqun, Tang, Tao, Cui, Yingbo
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer Nature B.V 15.04.2025
Predmet:
ISSN:1573-0484, 0920-8542, 1573-0484
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:A pangenome graph represents the genomes of multiple individuals, offering a comprehensive reference and overcoming allele bias from linear reference genomes. Sequence-to-graph alignment, crucial for pangenome tasks, aligns sequences to a graph to find the best matches. However, existing algorithms struggle with large-scale sequences. In this paper, we propose PVGwfa, a multi-level parallel sequence-to-graph alignment algorithm. We first employ MPI and Pthread for multi-process and multi-thread parallelization. Next, we introduce a hybrid load balancing strategy for better performance. Additionally, we vectorize the core of PVGwfa using SIMD instructions to accelerate sequence alignment. Experiments on real and simulated datasets show that PVGwfa reduces computation time from nearly an hour to a few minutes. For large datasets, PVGwfa achieved speedups ranging from 1.98× to 100.44× as the number of processes increased from 2 to 128, while maintaining consistent alignment results. The PVGwfa tool and source code are publicly available at https://github.com/nudt-bioinfo/PVGwfa.git.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1573-0484
0920-8542
1573-0484
DOI:10.1007/s11227-025-07184-z