Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment
Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequenci...
Uloženo v:
| Vydáno v: | 2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT) s. 133 - 143 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
13.10.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-theart baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 \times and 8.03 \times using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z -drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture. |
|---|---|
| AbstractList | Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the runtime of critical bioinformatics applications. Recent advancements in sequencing technologies keep increasing the throughput of genomic sequencing data while decreasing the associated cost, emphasizing the need for fast and accurate software to perform sequence analysis, given the quadratic complexity of exact pairwise algorithms. In this challenging scenario, we present the first fully GPU-accelerated version of the KSW2 genome alignment library. Results show that our high-performance implementation achieves up to 1145.17 Giga Cell Updates Per Second (GCUPS) and speedups up to 72.83 × on a single NVIDIA Tesla H100 over the state-of-theart baseline software running on two Intel Xeon Platinum 8358 processors with a total of 128 CPU threads, while preserving alignment accuracy. Using the same configuration, we demonstrate a 66.00 × speedup, versus ksw2d-fast, a state-of-the-art improved version of one of the KSW2 algorithms. Furthermore, we compare our implementation against a recently proposed FPGA implementation of ksw2z, achieving speedups up to 156.37 × using a single H100 GPU. To further highlight the impact of our work, we integrate our accelerated kernels within one of the most used aligners and mappers in the State Of the Art, called minimap2, demonstrating runtime improvements by up to 8.51 \times and 8.03 \times using a single H100 GPU against the baseline software and mm2-fast, an optimized version of minimap2 which integrates ksw2d-fast as its core aligner. Our design accelerates all the algorithms of the state-of-the-art KSW2 aligner suite (splice, double- and single- gap affine) and supports the Z -drop heuristic and banded alignment as the original software to reduce the processing time further if needed. Finally, we evaluate our application on the H100 GPU, adapting the Berkeley Roofline model for KSW2 and demonstrating that our implementation is near optimal on our target GPU architecture. |
| Author | Onken, Seth Samadi, Mehrzad Zeni, Alberto Santambrogio, Marco Domenico |
| Author_xml | – sequence: 1 givenname: Alberto surname: Zeni fullname: Zeni, Alberto email: alberto.zeni@polimi.it organization: Politecnico di Milano, Italy,Dipartimento di Elettronica, Informazione e Bioingegneria,Italy – sequence: 2 givenname: Seth surname: Onken fullname: Onken, Seth email: sonken@nvidia.com organization: NVIDIA Corporation,USA – sequence: 3 givenname: Marco Domenico surname: Santambrogio fullname: Santambrogio, Marco Domenico email: marco.santambrogio@polimi.it organization: Politecnico di Milano, Italy,Dipartimento di Elettronica, Informazione e Bioingegneria,Italy – sequence: 4 givenname: Mehrzad surname: Samadi fullname: Samadi, Mehrzad email: msamadi@nvidia.com organization: NVIDIA Corporation,USA |
| BookMark | eNo1jk1LxDAYhCMoqGvPXjzkD3TN98dxWbUrFFxk97yk6ZsaaFNJq-C_t6Ke5mFmGOYanacxAUK3lKwpFfKeK6kItWuutDJWnKHCamsEIZooTs0lKqYpNkRqtvQMu0KHGj4huy6mDj_EECBD8oBfwX_kf-zdHMc04TBmvIvdW7mHvPDgfuJqf8QVpHEAvOljlwZI8w26CK6foPjTFTo-PR62u7J-qZ63m7p0zNi59MtlRpSg3CtGXAuees0bpcFK6oXguuHKOul8K93iUwsQnLCKAYMALV-hu9_dCACn9xwHl79OlBiiOSX8G_gjUQY |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3656019.3676894 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798400706318 |
| EndPage | 143 |
| ExternalDocumentID | 10807310 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a289t-c114206413c620adec1c73b67e951c4437b369a5acd5a3b619eefa4962e2efed3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001344829000011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 08 06:10:43 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a289t-c114206413c620adec1c73b67e951c4437b369a5acd5a3b619eefa4962e2efed3 |
| OpenAccessLink | https://doi.org/10.1145/3656019.3676894 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_10807310 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Oct.-13 |
| PublicationDateYYYYMMDD | 2024-10-13 |
| PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-13 day: 13 |
| PublicationDecade | 2020 |
| PublicationTitle | 2024 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT) |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib057256082 |
| Score | 2.2714589 |
| Snippet | Genome pairwise sequence alignment is one of the most computationally intensive workloads in many genomic pipelines, often accounting for over 90% of the... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 133 |
| SubjectTerms | Accuracy Bioinformatics DPX Field programmable gate arrays Genome Alignment Genomics GPU Graphics processing units Kernel KSW2 minimap2 Runtime Sequential analysis SIMD Software Software algorithms |
| Title | Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment |
| URI | https://ieeexplore.ieee.org/document/10807310 |
| WOSCitedRecordID | wos001344829000011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmACRBDf8sDqtv6InYwIaBmqKkIt6lY5zhVVggS1Kb8fn5tQFgY2y7JlyT757uz37hFypxJRpGCAcS4VUxaA5TnXTIC2mIFYGxjeryMzHiezWZo1ZPXAhQGAAD6DLjbDX35RuQ0-lfUQD2ckEqr2jTFbslZrPLFB552IpnwPV3FPYmEZnnaxKFmCqsS_9FOC-xgc_XPhYxLtiHg0-3ExJ2QPylMyGYG3v6AuRB8bfRM_7gVfzttmg3CjPiSlCOVg2Y4gQIfZlA6hrD6A3r8v3wIeICLTwdPk4Zk14gjM-hypZg5JsD6e4NJp0bcFOO6MzLUBHzM5paTJpU5tbF0RW9_PU4CFVakWIGABhTwjnbIq4ZxQDKkwU-Ba5wp0bP0ssP508Rexn6gLEuGWzD-39S_m7W5c_tF_RQ6Fd_14w3N5TTr1agM35MB91cv16jac2jfS85js |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0MmuhJjRi_7cFrgX5su3s0KmBcycaA4Ua63cGQ4GIQ_P12yiJePHhrmm6atM3Om_a9eYTcqFgUCRhgnEvFlAVgec41E6AtZiDWBoX3a2p6vXg4TLJKrB60MAAQyGfQwGZ4yy9mbolXZU3kwxmJgqrtSCnBV3Kt9fGJDIbvWFQFfLiKmhJLy_CkgWXJYvQl_uWgEgJIe_-fUx-Q-kaKR7OfIHNItqA8Iv0U_AkM_kL0vnI48eNe8O583aw4btSDUopkDpZtJAK0kw1oB8rZO9Db6eQtMALqZNB-6N91WWWPwKzPkhbMoQzWIwounRYtW4DjzshcG_CoySklTS51YiPrisj6fp4AjK1KtAABYyjkMamVsxJOCEVQhbkC1zpXoCPrvwLr9xffEVuxOiV1XJLRx6oCxmi9Gmd_9F-T3W7_OR2lj72nc7InPBDA_z2XF6S2mC_hkuy4r8Xkc34VdvAb-AOcMw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+33rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%28PACT%29&rft.atitle=Leveraging+Difference+Recurrence+Relations+for+High-Performance+GPU+Genome+Alignment&rft.au=Zeni%2C+Alberto&rft.au=Onken%2C+Seth&rft.au=Santambrogio%2C+Marco+Domenico&rft.au=Samadi%2C+Mehrzad&rft.date=2024-10-13&rft.pub=ACM&rft.spage=133&rft.epage=143&rft_id=info:doi/10.1145%2F3656019.3676894&rft.externalDocID=10807310 |