SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs

Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2021 58th ACM/IEEE Design Automation Conference (DAC) s. 37 - 42
Hlavní autoři: Zhao, Jianqi, Wen, Yao, Luo, Yuchen, Jin, Zhou, Liu, Weifeng, Zhou, Zhenya
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 05.12.2021
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extract level-set information to parallelize elimination operations in each level. However, because of the insufficient parallelism, neither of the methods can saturate a large amount of compute units on modern GPUs.We in this paper propose a synchronization-free sparse LU factorization algorithm called SFLU. To saturate GPU cores, our method lets each thread block eliminate a column and runs all the thread blocks at the same time. Through communicating dependency information stored on global memory, all the thread blocks either busy wait to run or get updated by their previous columns. Because elimination of all the columns work concurrently, our method avoids any barrier synchronization and saturates GPU resources. By benchmarking over 1000 sparse matrices on an NVIDIA Titan RTX GPU, our SFLU outperforms SuperLU and GLU by a factor of on average 155.71 and 8.21 (up to 3585.62 and 252.66), respectively.
AbstractList Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extract level-set information to parallelize elimination operations in each level. However, because of the insufficient parallelism, neither of the methods can saturate a large amount of compute units on modern GPUs.We in this paper propose a synchronization-free sparse LU factorization algorithm called SFLU. To saturate GPU cores, our method lets each thread block eliminate a column and runs all the thread blocks at the same time. Through communicating dependency information stored on global memory, all the thread blocks either busy wait to run or get updated by their previous columns. Because elimination of all the columns work concurrently, our method avoids any barrier synchronization and saturates GPU resources. By benchmarking over 1000 sparse matrices on an NVIDIA Titan RTX GPU, our SFLU outperforms SuperLU and GLU by a factor of on average 155.71 and 8.21 (up to 3585.62 and 252.66), respectively.
Author Jin, Zhou
Zhou, Zhenya
Liu, Weifeng
Zhao, Jianqi
Wen, Yao
Luo, Yuchen
Author_xml – sequence: 1
  givenname: Jianqi
  surname: Zhao
  fullname: Zhao, Jianqi
  email: 13022291538@163.com
  organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China
– sequence: 2
  givenname: Yao
  surname: Wen
  fullname: Wen, Yao
  email: wy044399@163.com
  organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China
– sequence: 3
  givenname: Yuchen
  surname: Luo
  fullname: Luo, Yuchen
  email: 546780156@qq.com
  organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China
– sequence: 4
  givenname: Zhou
  surname: Jin
  fullname: Jin, Zhou
  email: jinzhou@cup.edu.cn
  organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China
– sequence: 5
  givenname: Weifeng
  surname: Liu
  fullname: Liu, Weifeng
  email: weifeng.liu@cup.edu.cn
  organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China
– sequence: 6
  givenname: Zhenya
  surname: Zhou
  fullname: Zhou, Zhenya
  email: zhouzhy@mail.empyrean.com.cn
  organization: Huada Empyrean Software Co. Ltd,Beijing,China
BookMark eNotj9FKwzAYhSMoqLNPIEJeoDV_mqaJd6PaKRQUar0dSfoHA1s70u5iPr2FDQ7nwPngwLkn18M4ICFPwDIApp9f1xUoVoqMMw6ZLpQEAVck0aUCKQuR81KwW5JMU7BMskKJxe_IT1s33QttT4P7jeMQ_swcxiGtIyJtDyZOSJuO1sbNY7xA6se4NNNMqxDdMcy0Dfvj7swWbb666YHceLObMLnkinT123f1njafm49q3aSGQzmn0IOwyrrlAEe0uS1AG8m5t73VJbjeQyl1boU0An2vveKFBI6Fc9wwlucr8njeDYi4PcSwN_G0vbzP_wGJkFJS
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC18074.2021.9586141
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665432740
1665432748
EndPage 42
ExternalDocumentID 9586141
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a217t-1d14b8bc1092eeb3b519a622fbdb971cdf17693b46a4efd9f825612e5cc2a0033
IEDL.DBID RIE
ISICitedReferencesCount 19
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000766079700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:28:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a217t-1d14b8bc1092eeb3b519a622fbdb971cdf17693b46a4efd9f825612e5cc2a0033
PageCount 6
ParticipantIDs ieee_primary_9586141
PublicationCentury 2000
PublicationDate 2021-Dec.-5
PublicationDateYYYYMMDD 2021-12-05
PublicationDate_xml – month: 12
  year: 2021
  text: 2021-Dec.-5
  day: 05
PublicationDecade 2020
PublicationTitle 2021 58th ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib060584060
Score 2.2830544
Snippet Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs....
SourceID ieee
SourceType Publisher
StartPage 37
SubjectTerms Benchmark testing
Circuit simulation
Design automation
GPU
Graphics processing units
Instruction sets
Parallel processing
sparse LU factorization
Synchronization
Title SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs
URI https://ieeexplore.ieee.org/document/9586141
WOSCitedRecordID wos000766079700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePCksom_ycGj3Zq2aVpvMq0exhjUym4jP16gBzvpOsH_3tesTgQvQg4hIZS8tH3vS_K9j5AbYUPDJbdekJrYiwR-iokMwYtBKGbxrxxap1oyFbNZslik8x653XFhAMBdPoNRW3Vn-WalN-1W2TjlCXoTxDp7QogtV-v73WlP99A3-R1Jh_np-OF-wtpULwgCAzbqxv4SUXE-JDv839OPyPCHjEfnOzdzTHpQDchrnk2LO5p_Vtplt92SKb2sBqD5O4JVoNOCZk5Mp-ukGJ5iy7qhk7LWm7KhefnWiXdRLE_zYj0kRfb4Mnn2OokETyKWaDxmWKQSpXG-ASAuVhiQyTgIrDIqFUwby1qxQxXFMgJrUouAEGMa4FoHstVxOyH9alXBKaGcKwjBl9ZKGzHwUy11knLQSWIwhuRnZNDaZPm-zYKx7Mxx_nfzBTloze4ufvBL0m_qDVyRff3RlOv62i3dFzyEm2g
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9jCnpS2cRvc_Bot6Zr-uFNpnViHYOusttIkxfowW50neB_72tWJ4IXIYeQEkI-3_s1-b0fITe-HiguuLacUHmW6-NWDMQALA_8jGk8lQfaqJbE_ngczGbhpEVut1wYADCPz6BXZ81dvlrIdf2rrB_yAK0JYp0d7roO27C1vldPfb-H1sluaDrMDvsP90NWB3tBGOiwXlP7l4yKsSLRwf_aPyTdHzoenWwNzRFpQdEhb0kUp3c0-SykiW-7oVNaUQlAkyXCVaBxSiMjp9N8pOigYsmqosO8lOu8okn-3sh3UUxPk3TVJWn0OB2OrEYkwRKIJiqLKeZmQSaxvw4gMs7QJROe4-hMZaHPpNKsljvMXE-4oFWoERKiVwNcSkfUSm7HpF0sCjghlPMMBmALrYV2GdihFDIIOcggUOhF8lPSqcdkvtzEwZg3w3H2d_E12RtNX-N5_Dx-OSf79RSYZyD8grSrcg2XZFd-VPmqvDLT-AWB_56v
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+58th+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=SFLU%3A+Synchronization-Free+Sparse+LU+Factorization+for+Fast+Circuit+Simulation+on+GPUs&rft.au=Zhao%2C+Jianqi&rft.au=Wen%2C+Yao&rft.au=Luo%2C+Yuchen&rft.au=Jin%2C+Zhou&rft.date=2021-12-05&rft.pub=IEEE&rft.spage=37&rft.epage=42&rft_id=info:doi/10.1109%2FDAC18074.2021.9586141&rft.externalDocID=9586141