Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment

Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequenci...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT) s. 133 - 143
Hlavní autoři: Zeni, Alberto, Onken, Seth, Santambrogio, Marco Domenico, Samadi, Mehrzad
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 13.10.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-theart baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 \times and 8.03 \times using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z -drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture.
AbstractList Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-theart baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 \times and 8.03 \times using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z -drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture.
Author Onken, Seth
Samadi, Mehrzad
Zeni, Alberto
Santambrogio, Marco Domenico
Author_xml – sequence: 1
  givenname: Alberto
  surname: Zeni
  fullname: Zeni, Alberto
  email: alberto.zeni@polimi.it
  organization: Politecnico di Milano, Italy,Dipartimento di Elettronica, Informazione e Bioingegneria,Italy
– sequence: 2
  givenname: Seth
  surname: Onken
  fullname: Onken, Seth
  email: sonken@nvidia.com
  organization: NVIDIA Corporation,USA
– sequence: 3
  givenname: Marco Domenico
  surname: Santambrogio
  fullname: Santambrogio, Marco Domenico
  email: marco.santambrogio@polimi.it
  organization: Politecnico di Milano, Italy,Dipartimento di Elettronica, Informazione e Bioingegneria,Italy
– sequence: 4
  givenname: Mehrzad
  surname: Samadi
  fullname: Samadi, Mehrzad
  email: msamadi@nvidia.com
  organization: NVIDIA Corporation,USA
BookMark eNo1jk1LxDAYhCMoqGvPXjzkD3TN98dxWbUrFFxk97yk6ZsaaFNJq-C_t6Ke5mFmGOYanacxAUK3lKwpFfKeK6kItWuutDJWnKHCamsEIZooTs0lKqYpNkRqtvQMu0KHGj4huy6mDj_EECBD8oBfwX_kf-zdHMc04TBmvIvdW7mHvPDgfuJqf8QVpHEAvOljlwZI8w26CK6foPjTFTo-PR62u7J-qZ63m7p0zNi59MtlRpSg3CtGXAuees0bpcFK6oXguuHKOul8K93iUwsQnLCKAYMALV-hu9_dCACn9xwHl79OlBiiOSX8G_gjUQY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3656019.3676894
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400706318
EndPage 143
ExternalDocumentID 10807310
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a289t-c114206413c620adec1c73b67e951c4437b369a5acd5a3b619eefa4962e2efed3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001344829000011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jan 08 06:10:43 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a289t-c114206413c620adec1c73b67e951c4437b369a5acd5a3b619eefa4962e2efed3
OpenAccessLink https://doi.org/10.1145/3656019.3676894
PageCount 11
ParticipantIDs ieee_primary_10807310
PublicationCentury 2000
PublicationDate 2024-Oct.-13
PublicationDateYYYYMMDD 2024-10-13
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-13
  day: 13
PublicationDecade 2020
PublicationTitle 2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT)
PublicationTitleAbbrev PACT
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib057256082
Score 2.2714589
Snippet Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the...
SourceID ieee
SourceType Publisher
StartPage 133
SubjectTerms Accuracy
Bioinformatics
DPX
Field programmable gate arrays
Genome Alignment
Genomics
GPU
Graphics processing units
Kernel
KSW2
minimap2
Runtime
Sequential analysis
SIMD
Software
Software algorithms
Title Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment
URI https://ieeexplore.ieee.org/document/10807310
WOSCitedRecordID wos001344829000011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmACRBDf8sDqtv6InYwIaBmqKkIt6lY5zhVVggS1Kb8fn5tQFgY2y7JlyT757uz37hFypxJRpGCAcS4VUxaA5TnXTIC2mIFYGxjeryMzHiezWZo1ZPXAhQGAAD6DLjbDX35RuQ0-lfUQD2ckEqr2jTFbslZrPLFB552IpnwPV3FPYmEZnnaxKFmCqsS_9FOC-xgc_XPhYxLtiHg0-3ExJ2QPylMyGYG3v6AuRB8bfRM_7gVfzttmg3CjPiSlCOVg2Y4gQIfZlA6hrD6A3r8v3wIeICLTwdPk4Zk14gjM-hypZg5JsD6e4NJp0bcFOO6MzLUBHzM5paTJpU5tbF0RW9_PU4CFVakWIGABhTwjnbIq4ZxQDKkwU-Ba5wp0bP0ssP508Rexn6gLEuGWzD-39S_m7W5c_tF_RQ6Fd_14w3N5TTr1agM35MB91cv16jac2jfS85js
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0MmuhJjRi_7cFrgX5su3s0KmBcycaA4Ua63cGQ4GIQ_P12yiJePHhrmm6atM3Om_a9eYTcqFgUCRhgnEvFlAVgec41E6AtZiDWBoX3a2p6vXg4TLJKrB60MAAQyGfQwGZ4yy9mbolXZU3kwxmJgqrtSCnBV3Kt9fGJDIbvWFQFfLiKmhJLy_CkgWXJYvQl_uWgEgJIe_-fUx-Q-kaKR7OfIHNItqA8Iv0U_AkM_kL0vnI48eNe8O583aw4btSDUopkDpZtJAK0kw1oB8rZO9Db6eQtMALqZNB-6N91WWWPwKzPkhbMoQzWIwounRYtW4DjzshcG_CoySklTS51YiPrisj6fp4AjK1KtAABYyjkMamVsxJOCEVQhbkC1zpXoCPrvwLr9xffEVuxOiV1XJLRx6oCxmi9Gmd_9F-T3W7_OR2lj72nc7InPBDA_z2XF6S2mC_hkuy4r8Xkc34VdvAb-AOcMw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+33rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%28PACT%29&rft.atitle=Leveraging+Difference+Recurrence+Relations+for+High-Performance+GPU+Genome+Alignment&rft.au=Zeni%2C+Alberto&rft.au=Onken%2C+Seth&rft.au=Santambrogio%2C+Marco+Domenico&rft.au=Samadi%2C+Mehrzad&rft.date=2024-10-13&rft.pub=ACM&rft.spage=133&rft.epage=143&rft_id=info:doi/10.1145%2F3656019.3676894&rft.externalDocID=10807310