GraphFI: An Efficient Fault Injection Framework for Graph Processing on GPGPUs

As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specifi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7
Hauptverfasser: Jiang, Nan, Yue, Hengshan, Tan, Jingweijia, Zhou, Mengting, Wang, Xiaonan, Wang, Yuchun, Wei, Wenda, Qiu, Meikang, Wei, Xiaohui
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 22.06.2025
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specific execution paradigm, existing Fault Injection (FI) reliability analysis methods typically incur inaccurate system error resilience characterization, making it challenging to provide helpful guidance for efficient and reliable graph processing paradigm design. This paper proposes GraphFI, an efficient Graph Fault Injection framework on the universal parallel tasks acceleration platform (i.e., GPGPUs). Our key insight is progressively excavating the graph-specific error propagation and effect mechanisms, thereby avoiding blind FI trials. Firstly, observing that iterations with similar active vertex set exhibit similar error behavior, we propose iteration-driven GraphFI (ID-GraphFI) to solely select representative iterations for fast error resilience profile assessment. Secondly, by detecting resilience-similarity communities in graph topology, we propose topology-driven GraphFI (TD-GraphFI) that only selects representative vertices for community overall reliability evaluation. Thirdly, by exploring the graph-specific fault monotonic property, we propose the monotonicity-driven GraphFI (MD-GraphFI) to granularly draw system severe error boundaries for predictable/unnecessary fault injection avoidance. Merging them all, GraphFI can reduce system fault site space by up to two orders of magnitude, which achieves 2.1 \sim 15.2 \times speedup compared to SOTA methods while providing better reliability assessment accuracy.
AbstractList As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specific execution paradigm, existing Fault Injection (FI) reliability analysis methods typically incur inaccurate system error resilience characterization, making it challenging to provide helpful guidance for efficient and reliable graph processing paradigm design. This paper proposes GraphFI, an efficient Graph Fault Injection framework on the universal parallel tasks acceleration platform (i.e., GPGPUs). Our key insight is progressively excavating the graph-specific error propagation and effect mechanisms, thereby avoiding blind FI trials. Firstly, observing that iterations with similar active vertex set exhibit similar error behavior, we propose iteration-driven GraphFI (ID-GraphFI) to solely select representative iterations for fast error resilience profile assessment. Secondly, by detecting resilience-similarity communities in graph topology, we propose topology-driven GraphFI (TD-GraphFI) that only selects representative vertices for community overall reliability evaluation. Thirdly, by exploring the graph-specific fault monotonic property, we propose the monotonicity-driven GraphFI (MD-GraphFI) to granularly draw system severe error boundaries for predictable/unnecessary fault injection avoidance. Merging them all, GraphFI can reduce system fault site space by up to two orders of magnitude, which achieves 2.1 \sim 15.2 \times speedup compared to SOTA methods while providing better reliability assessment accuracy.
Author Tan, Jingweijia
Qiu, Meikang
Yue, Hengshan
Wei, Xiaohui
Zhou, Mengting
Jiang, Nan
Wang, Yuchun
Wang, Xiaonan
Wei, Wenda
Author_xml – sequence: 1
  givenname: Nan
  surname: Jiang
  fullname: Jiang, Nan
  email: yuehs@jlu.edu.cn
  organization: Jilin University,College of Computer Science and Technology
– sequence: 2
  givenname: Hengshan
  surname: Yue
  fullname: Yue, Hengshan
  email: weixh@jlu.edu.cn
  organization: Jilin University,College of Computer Science and Technology
– sequence: 3
  givenname: Jingweijia
  surname: Tan
  fullname: Tan, Jingweijia
  organization: Jilin University,College of Computer Science and Technology
– sequence: 4
  givenname: Mengting
  surname: Zhou
  fullname: Zhou, Mengting
  organization: Jilin University,College of Computer Science and Technology
– sequence: 5
  givenname: Xiaonan
  surname: Wang
  fullname: Wang, Xiaonan
  organization: Jilin University,College of Computer Science and Technology
– sequence: 6
  givenname: Yuchun
  surname: Wang
  fullname: Wang, Yuchun
  organization: Jilin University,College of Computer Science and Technology
– sequence: 7
  givenname: Wenda
  surname: Wei
  fullname: Wei, Wenda
  organization: Jilin University,College of Computer Science and Technology
– sequence: 8
  givenname: Meikang
  surname: Qiu
  fullname: Qiu, Meikang
  organization: Augusta University,School of Computer and Cyber Science
– sequence: 9
  givenname: Xiaohui
  surname: Wei
  fullname: Wei, Xiaohui
  organization: Jilin University,College of Computer Science and Technology
BookMark eNo1j11LwzAYhSPohc79A5H8gc4kb9ok3o261sLQXrjrkaZ5NbqlI62I_37Dj6sDh-c8cK7IeRyiJ-SWswXnzNw9LMsCtDQLwUR-qjiAkPyMzI0yGoDnDJjUl-SpTvbwVjX3dBnpCjG44ONEK_u5m2gT372bwhBplezefw3pg-KQ6M-GtmlwfhxDfKUnom7rdjNekwu0u9HP_3JGNtXqpXzM1s91Uy7XmeXKTJnAnlnuGbgOpJOGdcwq2XsonAKmQHWdRtSes77HQitjc6GRIS-cyDtEmJGbX2_w3m8PKext-t7-34Qj6j9LxQ
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11133241
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11133241
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
– fundername: Jilin University
  funderid: 10.13039/501100004032
– fundername: National Key Research and Development Program of China
  funderid: 10.13039/501100012166
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-2fd0a1e03cb34c490b0a74de36c730737bb8ff8e10ddf6879a528f0f16c25bff3
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-2fd0a1e03cb34c490b0a74de36c730737bb8ff8e10ddf6879a528f0f16c25bff3
PageCount 7
ParticipantIDs ieee_primary_11133241
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.295365
Snippet As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Excavation
Finance
Fraud
Merging
Power system reliability
Real-time systems
Reliability engineering
Resilience
Topology
Title GraphFI: An Efficient Fault Injection Framework for Graph Processing on GPGPUs
URI https://ieeexplore.ieee.org/document/11133241
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aPHhSseKbHLxum80mm6y3Uru1IGUPFnoreUxAka202_5-k3SrePDgLYQ8YJIwj8z3DUIPxlhtlAw0-1kI3XCeKG5dkgOXShGjDUR2_Rcxncr5vKhasHrEwgBATD6DXmjGv3y7NJsQKuuHsujeAPDOzqEQYgfWalG_KSn6T4Ohv00swE8o7-0H_yqbErVGefLP_U5R9wd_h6tvzXKGDqA-R9Nx4JYuJ494UONRZH7ws3GpNh8NntTvMaeqxuU-2wp7cxTHObhFA_i1sB8xrsbVbN1Fs3L0OnxO2moIifKPpkmos0SlQDKjM2ZYQTRRglnIciPCQxVaS-ckpMRal0tRKE6lIy7NDeXauewCdeplDZcIM2_GSEOtdy4skxqkV2OKaeEKGtgRyRXqBmEsPneEF4u9HK7_6L9Bx0HkIYOK0lvUaVYbuENHZtu8rVf38Zi-ALoklDU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LS8MwGA8yBT2pOPFtDl67pWnSpt7GXLdiLT1ssNvIExTpZOv8-02yTvHgwVsIecCXhO-R7_f7AHiQUgnJmaPZj1zohtKAU2WCWFPGOZJCas-uXyRlyebztGrB6h4Lo7X2yWe655r-L18t5caFyvquLLo1AKyzs08JweEWrtXifkOU9p8GQ3ufiAOgYNrbDf9VOMXrjez4nzuegO4PAg9W37rlFOzp-gyUY8cuneWPcFDDked-sLNhxjfvDczrN59VVcNsl28FrUEK_RzY4gHsWtCOGFfjarbuglk2mg4nQVsPIeD22TQBNgrxUKNIiohIkiKBeEKUjmKZuKeaCMGMYTpESpmYJSmnmBlkwlhiKoyJzkGnXtb6AkBiDRkmsbLuhSJMaGYVGSciMSl2_IjoEnSdMBYfW8qLxU4OV3_034PDyfSlWBR5-XwNjpz4XT4Vxjeg06w2-hYcyM_mdb2680f2BaKDl3w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=GraphFI%3A+An+Efficient+Fault+Injection+Framework+for+Graph+Processing+on+GPGPUs&rft.au=Jiang%2C+Nan&rft.au=Yue%2C+Hengshan&rft.au=Tan%2C+Jingweijia&rft.au=Zhou%2C+Mengting&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133241&rft.externalDocID=11133241