SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs
Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extr...
Uloženo v:
| Vydáno v: | 2021 58th ACM/IEEE Design Automation Conference (DAC) s. 37 - 42 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
05.12.2021
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extract level-set information to parallelize elimination operations in each level. However, because of the insufficient parallelism, neither of the methods can saturate a large amount of compute units on modern GPUs.We in this paper propose a synchronization-free sparse LU factorization algorithm called SFLU. To saturate GPU cores, our method lets each thread block eliminate a column and runs all the thread blocks at the same time. Through communicating dependency information stored on global memory, all the thread blocks either busy wait to run or get updated by their previous columns. Because elimination of all the columns work concurrently, our method avoids any barrier synchronization and saturates GPU resources. By benchmarking over 1000 sparse matrices on an NVIDIA Titan RTX GPU, our SFLU outperforms SuperLU and GLU by a factor of on average 155.71 and 8.21 (up to 3585.62 and 252.66), respectively. |
|---|---|
| AbstractList | Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extract level-set information to parallelize elimination operations in each level. However, because of the insufficient parallelism, neither of the methods can saturate a large amount of compute units on modern GPUs.We in this paper propose a synchronization-free sparse LU factorization algorithm called SFLU. To saturate GPU cores, our method lets each thread block eliminate a column and runs all the thread blocks at the same time. Through communicating dependency information stored on global memory, all the thread blocks either busy wait to run or get updated by their previous columns. Because elimination of all the columns work concurrently, our method avoids any barrier synchronization and saturates GPU resources. By benchmarking over 1000 sparse matrices on an NVIDIA Titan RTX GPU, our SFLU outperforms SuperLU and GLU by a factor of on average 155.71 and 8.21 (up to 3585.62 and 252.66), respectively. |
| Author | Jin, Zhou Zhou, Zhenya Liu, Weifeng Zhao, Jianqi Wen, Yao Luo, Yuchen |
| Author_xml | – sequence: 1 givenname: Jianqi surname: Zhao fullname: Zhao, Jianqi email: 13022291538@163.com organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 2 givenname: Yao surname: Wen fullname: Wen, Yao email: wy044399@163.com organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 3 givenname: Yuchen surname: Luo fullname: Luo, Yuchen email: 546780156@qq.com organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 4 givenname: Zhou surname: Jin fullname: Jin, Zhou email: jinzhou@cup.edu.cn organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 5 givenname: Weifeng surname: Liu fullname: Liu, Weifeng email: weifeng.liu@cup.edu.cn organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 6 givenname: Zhenya surname: Zhou fullname: Zhou, Zhenya email: zhouzhy@mail.empyrean.com.cn organization: Huada Empyrean Software Co. Ltd,Beijing,China |
| BookMark | eNotj9FKwzAYhSMoqLNPIEJeoDV_mqaJd6PaKRQUar0dSfoHA1s70u5iPr2FDQ7nwPngwLkn18M4ICFPwDIApp9f1xUoVoqMMw6ZLpQEAVck0aUCKQuR81KwW5JMU7BMskKJxe_IT1s33QttT4P7jeMQ_swcxiGtIyJtDyZOSJuO1sbNY7xA6se4NNNMqxDdMcy0Dfvj7swWbb666YHceLObMLnkinT123f1njafm49q3aSGQzmn0IOwyrrlAEe0uS1AG8m5t73VJbjeQyl1boU0An2vveKFBI6Fc9wwlucr8njeDYi4PcSwN_G0vbzP_wGJkFJS |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/DAC18074.2021.9586141 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781665432740 1665432748 |
| EndPage | 42 |
| ExternalDocumentID | 9586141 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IH ACM ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a217t-1d14b8bc1092eeb3b519a622fbdb971cdf17693b46a4efd9f825612e5cc2a0033 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 19 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000766079700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:28:29 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a217t-1d14b8bc1092eeb3b519a622fbdb971cdf17693b46a4efd9f825612e5cc2a0033 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_9586141 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-Dec.-5 |
| PublicationDateYYYYMMDD | 2021-12-05 |
| PublicationDate_xml | – month: 12 year: 2021 text: 2021-Dec.-5 day: 05 |
| PublicationDecade | 2020 |
| PublicationTitle | 2021 58th ACM/IEEE Design Automation Conference (DAC) |
| PublicationTitleAbbrev | DAC |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib060584060 |
| Score | 2.2830544 |
| Snippet | Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 37 |
| SubjectTerms | Benchmark testing Circuit simulation Design automation GPU Graphics processing units Instruction sets Parallel processing sparse LU factorization Synchronization |
| Title | SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs |
| URI | https://ieeexplore.ieee.org/document/9586141 |
| WOSCitedRecordID | wos000766079700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePCksom_ycGj3Zq2aVpvMq0exhjUym4jP16gBzvpOsH_3tesTgQvQg4hIZS8tH3vS_K9j5AbYUPDJbdekJrYiwR-iokMwYtBKGbxrxxap1oyFbNZslik8x653XFhAMBdPoNRW3Vn-WalN-1W2TjlCXoTxDp7QogtV-v73WlP99A3-R1Jh_np-OF-wtpULwgCAzbqxv4SUXE-JDv839OPyPCHjEfnOzdzTHpQDchrnk2LO5p_Vtplt92SKb2sBqD5O4JVoNOCZk5Mp-ukGJ5iy7qhk7LWm7KhefnWiXdRLE_zYj0kRfb4Mnn2OokETyKWaDxmWKQSpXG-ASAuVhiQyTgIrDIqFUwby1qxQxXFMgJrUouAEGMa4FoHstVxOyH9alXBKaGcKwjBl9ZKGzHwUy11knLQSWIwhuRnZNDaZPm-zYKx7Mxx_nfzBTloze4ufvBL0m_qDVyRff3RlOv62i3dFzyEm2g |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9jCnpS2cRvc_Bot6Zr-uFNpnViHYOusttIkxfowW50neB_72tWJ4IXIYeQEkI-3_s1-b0fITe-HiguuLacUHmW6-NWDMQALA_8jGk8lQfaqJbE_ngczGbhpEVut1wYADCPz6BXZ81dvlrIdf2rrB_yAK0JYp0d7roO27C1vldPfb-H1sluaDrMDvsP90NWB3tBGOiwXlP7l4yKsSLRwf_aPyTdHzoenWwNzRFpQdEhb0kUp3c0-SykiW-7oVNaUQlAkyXCVaBxSiMjp9N8pOigYsmqosO8lOu8okn-3sh3UUxPk3TVJWn0OB2OrEYkwRKIJiqLKeZmQSaxvw4gMs7QJROe4-hMZaHPpNKsljvMXE-4oFWoERKiVwNcSkfUSm7HpF0sCjghlPMMBmALrYV2GdihFDIIOcggUOhF8lPSqcdkvtzEwZg3w3H2d_E12RtNX-N5_Dx-OSf79RSYZyD8grSrcg2XZFd-VPmqvDLT-AWB_56v |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+58th+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=SFLU%3A+Synchronization-Free+Sparse+LU+Factorization+for+Fast+Circuit+Simulation+on+GPUs&rft.au=Zhao%2C+Jianqi&rft.au=Wen%2C+Yao&rft.au=Luo%2C+Yuchen&rft.au=Jin%2C+Zhou&rft.date=2021-12-05&rft.pub=IEEE&rft.spage=37&rft.epage=42&rft_id=info:doi/10.1109%2FDAC18074.2021.9586141&rft.externalDocID=9586141 |