SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs

Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extr...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2021 58th ACM/IEEE Design Automation Conference (DAC) s. 37 - 42
Hlavní autoři:	Zhao, Jianqi, Wen, Yao, Luo, Yuchen, Jin, Zhou, Liu, Weifeng, Zhou, Zhenya
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 05.12.2021
Témata:	Benchmark testing Circuit simulation Design automation GPU Graphics processing units Instruction sets Parallel processing sparse LU factorization Synchronization
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extract level-set information to parallelize elimination operations in each level. However, because of the insufficient parallelism, neither of the methods can saturate a large amount of compute units on modern GPUs.We in this paper propose a synchronization-free sparse LU factorization algorithm called SFLU. To saturate GPU cores, our method lets each thread block eliminate a column and runs all the thread blocks at the same time. Through communicating dependency information stored on global memory, all the thread blocks either busy wait to run or get updated by their previous columns. Because elimination of all the columns work concurrently, our method avoids any barrier synchronization and saturates GPU resources. By benchmarking over 1000 sparse matrices on an NVIDIA Titan RTX GPU, our SFLU outperforms SuperLU and GLU by a factor of on average 155.71 and 8.21 (up to 3585.62 and 252.66), respectively.
AbstractList	Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs. Existing GPU-accelerated sparse LU factorization methods either offload relatively small dense matrix-matrix multiplications to GPU cores, or extract level-set information to parallelize elimination operations in each level. However, because of the insufficient parallelism, neither of the methods can saturate a large amount of compute units on modern GPUs.We in this paper propose a synchronization-free sparse LU factorization algorithm called SFLU. To saturate GPU cores, our method lets each thread block eliminate a column and runs all the thread blocks at the same time. Through communicating dependency information stored on global memory, all the thread blocks either busy wait to run or get updated by their previous columns. Because elimination of all the columns work concurrently, our method avoids any barrier synchronization and saturates GPU resources. By benchmarking over 1000 sparse matrices on an NVIDIA Titan RTX GPU, our SFLU outperforms SuperLU and GLU by a factor of on average 155.71 and 8.21 (up to 3585.62 and 252.66), respectively.
Author	Jin, Zhou Zhou, Zhenya Liu, Weifeng Zhao, Jianqi Wen, Yao Luo, Yuchen
Author_xml	– sequence: 1 givenname: Jianqi surname: Zhao fullname: Zhao, Jianqi email: 13022291538@163.com organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 2 givenname: Yao surname: Wen fullname: Wen, Yao email: wy044399@163.com organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 3 givenname: Yuchen surname: Luo fullname: Luo, Yuchen email: 546780156@qq.com organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 4 givenname: Zhou surname: Jin fullname: Jin, Zhou email: jinzhou@cup.edu.cn organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 5 givenname: Weifeng surname: Liu fullname: Liu, Weifeng email: weifeng.liu@cup.edu.cn organization: China University of Petroleum-Beijing,Super Scientific Software Laboratory,Department of Computer Science and Technology,Beijing,China – sequence: 6 givenname: Zhenya surname: Zhou fullname: Zhou, Zhenya email: zhouzhy@mail.empyrean.com.cn organization: Huada Empyrean Software Co. Ltd,Beijing,China
BookMark	eNotj9FKwzAYhSMoqLNPIEJeoDV_mqaJd6PaKRQUar0dSfoHA1s70u5iPr2FDQ7nwPngwLkn18M4ICFPwDIApp9f1xUoVoqMMw6ZLpQEAVck0aUCKQuR81KwW5JMU7BMskKJxe_IT1s33QttT4P7jeMQ_swcxiGtIyJtDyZOSJuO1sbNY7xA6se4NNNMqxDdMcy0Dfvj7swWbb666YHceLObMLnkinT123f1njafm49q3aSGQzmn0IOwyrrlAEe0uS1AG8m5t73VJbjeQyl1boU0An2vveKFBI6Fc9wwlucr8njeDYi4PcSwN_G0vbzP_wGJkFJS
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/DAC18074.2021.9586141
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781665432740 1665432748
EndPage	42
ExternalDocumentID	9586141
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809
GroupedDBID	6IE 6IH ACM ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIO
ID	FETCH-LOGICAL-a217t-1d14b8bc1092eeb3b519a622fbdb971cdf17693b46a4efd9f825612e5cc2a0033
IEDL.DBID	RIE
ISICitedReferencesCount	19
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000766079700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:28:29 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a217t-1d14b8bc1092eeb3b519a622fbdb971cdf17693b46a4efd9f825612e5cc2a0033
PageCount	6
ParticipantIDs	ieee_primary_9586141
PublicationCentury	2000
PublicationDate	2021-Dec.-5
PublicationDateYYYYMMDD	2021-12-05
PublicationDate_xml	– month: 12 year: 2021 text: 2021-Dec.-5 day: 05
PublicationDecade	2020
PublicationTitle	2021 58th ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev	DAC
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib060584060
Score	2.2830544
Snippet	Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs....
SourceID	ieee
SourceType	Publisher
StartPage	37
SubjectTerms	Benchmark testing Circuit simulation Design automation GPU Graphics processing units Instruction sets Parallel processing sparse LU factorization Synchronization
Title	SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs
URI	https://ieeexplore.ieee.org/document/9586141
WOSCitedRecordID	wos000766079700007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePCksom_ycGj3Zq2aVpvMq0exhjUym4jP16gBzvpOsH_3tesTgQvQg4hIZS8tH3vS_K9j5AbYUPDJbdekJrYiwR-iokMwYtBKGbxrxxap1oyFbNZslik8x653XFhAMBdPoNRW3Vn-WalN-1W2TjlCXoTxDp7QogtV-v73WlP99A3-R1Jh_np-OF-wtpULwgCAzbqxv4SUXE-JDv839OPyPCHjEfnOzdzTHpQDchrnk2LO5p_Vtplt92SKb2sBqD5O4JVoNOCZk5Mp-ukGJ5iy7qhk7LWm7KhefnWiXdRLE_zYj0kRfb4Mnn2OokETyKWaDxmWKQSpXG-ASAuVhiQyTgIrDIqFUwby1qxQxXFMgJrUouAEGMa4FoHstVxOyH9alXBKaGcKwjBl9ZKGzHwUy11knLQSWIwhuRnZNDaZPm-zYKx7Mxx_nfzBTloze4ufvBL0m_qDVyRff3RlOv62i3dFzyEm2g
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9jCnpS2cRvc_Bot6Zr-uFNpnViHYOusttIkxfowW50neB_72tWJ4IXIYeQEkI-3_s1-b0fITe-HiguuLacUHmW6-NWDMQALA_8jGk8lQfaqJbE_ngczGbhpEVut1wYADCPz6BXZ81dvlrIdf2rrB_yAK0JYp0d7roO27C1vldPfb-H1sluaDrMDvsP90NWB3tBGOiwXlP7l4yKsSLRwf_aPyTdHzoenWwNzRFpQdEhb0kUp3c0-SykiW-7oVNaUQlAkyXCVaBxSiMjp9N8pOigYsmqosO8lOu8okn-3sh3UUxPk3TVJWn0OB2OrEYkwRKIJiqLKeZmQSaxvw4gMs7QJROe4-hMZaHPpNKsljvMXE-4oFWoERKiVwNcSkfUSm7HpF0sCjghlPMMBmALrYV2GdihFDIIOcggUOhF8lPSqcdkvtzEwZg3w3H2d_E12RtNX-N5_Dx-OSf79RSYZyD8grSrcg2XZFd-VPmqvDLT-AWB_56v
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+58th+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=SFLU%3A+Synchronization-Free+Sparse+LU+Factorization+for+Fast+Circuit+Simulation+on+GPUs&rft.au=Zhao%2C+Jianqi&rft.au=Wen%2C+Yao&rft.au=Luo%2C+Yuchen&rft.au=Jin%2C+Zhou&rft.date=2021-12-05&rft.pub=IEEE&rft.spage=37&rft.epage=42&rft_id=info:doi/10.1109%2FDAC18074.2021.9586141&rft.externalDocID=9586141