Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment

Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequenci...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT) s. 133 - 143
Hlavní autoři:	Zeni, Alberto, Onken, Seth, Santambrogio, Marco Domenico, Samadi, Mehrzad
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 13.10.2024
Témata:	Accuracy Bioinformatics DPX Field programmable gate arrays Genome Alignment Genomics GPU Graphics processing units Kernel KSW2 minimap2 Runtime Sequential analysis SIMD Software Software algorithms
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-theart baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 \times and 8.03 \times using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z -drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture.
AbstractList	Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-theart baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 \times and 8.03 \times using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z -drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture.
Author	Onken, Seth Samadi, Mehrzad Zeni, Alberto Santambrogio, Marco Domenico
Author_xml	– sequence: 1 givenname: Alberto surname: Zeni fullname: Zeni, Alberto email: alberto.zeni@polimi.it organization: Politecnico di Milano, Italy,Dipartimento di Elettronica, Informazione e Bioingegneria,Italy – sequence: 2 givenname: Seth surname: Onken fullname: Onken, Seth email: sonken@nvidia.com organization: NVIDIA Corporation,USA – sequence: 3 givenname: Marco Domenico surname: Santambrogio fullname: Santambrogio, Marco Domenico email: marco.santambrogio@polimi.it organization: Politecnico di Milano, Italy,Dipartimento di Elettronica, Informazione e Bioingegneria,Italy – sequence: 4 givenname: Mehrzad surname: Samadi fullname: Samadi, Mehrzad email: msamadi@nvidia.com organization: NVIDIA Corporation,USA
BookMark	eNo1jk1LxDAYhCMoqGvPXjzkD3TN98dxWbUrFFxk97yk6ZsaaFNJq-C_t6Ke5mFmGOYanacxAUK3lKwpFfKeK6kItWuutDJWnKHCamsEIZooTs0lKqYpNkRqtvQMu0KHGj4huy6mDj_EECBD8oBfwX_kf-zdHMc04TBmvIvdW7mHvPDgfuJqf8QVpHEAvOljlwZI8w26CK6foPjTFTo-PR62u7J-qZ63m7p0zNi59MtlRpSg3CtGXAuees0bpcFK6oXguuHKOul8K93iUwsQnLCKAYMALV-hu9_dCACn9xwHl79OlBiiOSX8G_gjUQY
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/3656019.3676894
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798400706318
EndPage	143
ExternalDocumentID	10807310
Genre	orig-research
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL
ID	FETCH-LOGICAL-a289t-c114206413c620adec1c73b67e951c4437b369a5acd5a3b619eefa4962e2efed3
IEDL.DBID	RIE
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001344829000011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Jan 08 06:10:43 EST 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a289t-c114206413c620adec1c73b67e951c4437b369a5acd5a3b619eefa4962e2efed3
OpenAccessLink	https://doi.org/10.1145/3656019.3676894
PageCount	11
ParticipantIDs	ieee_primary_10807310
PublicationCentury	2000
PublicationDate	2024-Oct.-13
PublicationDateYYYYMMDD	2024-10-13
PublicationDate_xml	– month: 10 year: 2024 text: 2024-Oct.-13 day: 13
PublicationDecade	2020
PublicationTitle	2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT)
PublicationTitleAbbrev	PACT
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssib057256082
Score	2.2714589
Snippet	Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the...
SourceID	ieee
SourceType	Publisher
StartPage	133
SubjectTerms	Accuracy Bioinformatics DPX Field programmable gate arrays Genome Alignment Genomics GPU Graphics processing units Kernel KSW2 minimap2 Runtime Sequential analysis SIMD Software Software algorithms
Title	Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment
URI	https://ieeexplore.ieee.org/document/10807310
WOSCitedRecordID	wos001344829000011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmACRBDf8sDqtv6InYwIaBmqKkIt6lY5zhVVggS1Kb8fn5tQFgY2y7JlyT757uz37hFypxJRpGCAcS4VUxaA5TnXTIC2mIFYGxjeryMzHiezWZo1ZPXAhQGAAD6DLjbDX35RuQ0-lfUQD2ckEqr2jTFbslZrPLFB552IpnwPV3FPYmEZnnaxKFmCqsS_9FOC-xgc_XPhYxLtiHg0-3ExJ2QPylMyGYG3v6AuRB8bfRM_7gVfzttmg3CjPiSlCOVg2Y4gQIfZlA6hrD6A3r8v3wIeICLTwdPk4Zk14gjM-hypZg5JsD6e4NJp0bcFOO6MzLUBHzM5paTJpU5tbF0RW9_PU4CFVakWIGABhTwjnbIq4ZxQDKkwU-Ba5wp0bP0ssP508Rexn6gLEuGWzD-39S_m7W5c_tF_RQ6Fd_14w3N5TTr1agM35MB91cv16jac2jfS85js
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0MmuhJjRi_7cFrgX5su3s0KmBcycaA4Ua63cGQ4GIQ_P12yiJePHhrmm6atM3Om_a9eYTcqFgUCRhgnEvFlAVgec41E6AtZiDWBoX3a2p6vXg4TLJKrB60MAAQyGfQwGZ4yy9mbolXZU3kwxmJgqrtSCnBV3Kt9fGJDIbvWFQFfLiKmhJLy_CkgWXJYvQl_uWgEgJIe_-fUx-Q-kaKR7OfIHNItqA8Iv0U_AkM_kL0vnI48eNe8O583aw4btSDUopkDpZtJAK0kw1oB8rZO9Db6eQtMALqZNB-6N91WWWPwKzPkhbMoQzWIwounRYtW4DjzshcG_CoySklTS51YiPrisj6fp4AjK1KtAABYyjkMamVsxJOCEVQhbkC1zpXoCPrvwLr9xffEVuxOiV1XJLRx6oCxmi9Gmd_9F-T3W7_OR2lj72nc7InPBDA_z2XF6S2mC_hkuy4r8Xkc34VdvAb-AOcMw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+33rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%28PACT%29&rft.atitle=Leveraging+Difference+Recurrence+Relations+for+High-Performance+GPU+Genome+Alignment&rft.au=Zeni%2C+Alberto&rft.au=Onken%2C+Seth&rft.au=Santambrogio%2C+Marco+Domenico&rft.au=Samadi%2C+Mehrzad&rft.date=2024-10-13&rft.pub=ACM&rft.spage=133&rft.epage=143&rft_id=info:doi/10.1145%2F3656019.3676894&rft.externalDocID=10807310