GraphFI: An Efficient Fault Injection Framework for Graph Processing on GPGPUs
As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specifi...
Gespeichert in:
| Veröffentlicht in: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7 |
|---|---|
| Hauptverfasser: | , , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
22.06.2025
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specific execution paradigm, existing Fault Injection (FI) reliability analysis methods typically incur inaccurate system error resilience characterization, making it challenging to provide helpful guidance for efficient and reliable graph processing paradigm design. This paper proposes GraphFI, an efficient Graph Fault Injection framework on the universal parallel tasks acceleration platform (i.e., GPGPUs). Our key insight is progressively excavating the graph-specific error propagation and effect mechanisms, thereby avoiding blind FI trials. Firstly, observing that iterations with similar active vertex set exhibit similar error behavior, we propose iteration-driven GraphFI (ID-GraphFI) to solely select representative iterations for fast error resilience profile assessment. Secondly, by detecting resilience-similarity communities in graph topology, we propose topology-driven GraphFI (TD-GraphFI) that only selects representative vertices for community overall reliability evaluation. Thirdly, by exploring the graph-specific fault monotonic property, we propose the monotonicity-driven GraphFI (MD-GraphFI) to granularly draw system severe error boundaries for predictable/unnecessary fault injection avoidance. Merging them all, GraphFI can reduce system fault site space by up to two orders of magnitude, which achieves 2.1 \sim 15.2 \times speedup compared to SOTA methods while providing better reliability assessment accuracy. |
|---|---|
| AbstractList | As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specific execution paradigm, existing Fault Injection (FI) reliability analysis methods typically incur inaccurate system error resilience characterization, making it challenging to provide helpful guidance for efficient and reliable graph processing paradigm design. This paper proposes GraphFI, an efficient Graph Fault Injection framework on the universal parallel tasks acceleration platform (i.e., GPGPUs). Our key insight is progressively excavating the graph-specific error propagation and effect mechanisms, thereby avoiding blind FI trials. Firstly, observing that iterations with similar active vertex set exhibit similar error behavior, we propose iteration-driven GraphFI (ID-GraphFI) to solely select representative iterations for fast error resilience profile assessment. Secondly, by detecting resilience-similarity communities in graph topology, we propose topology-driven GraphFI (TD-GraphFI) that only selects representative vertices for community overall reliability evaluation. Thirdly, by exploring the graph-specific fault monotonic property, we propose the monotonicity-driven GraphFI (MD-GraphFI) to granularly draw system severe error boundaries for predictable/unnecessary fault injection avoidance. Merging them all, GraphFI can reduce system fault site space by up to two orders of magnitude, which achieves 2.1 \sim 15.2 \times speedup compared to SOTA methods while providing better reliability assessment accuracy. |
| Author | Tan, Jingweijia Qiu, Meikang Yue, Hengshan Wei, Xiaohui Zhou, Mengting Jiang, Nan Wang, Yuchun Wang, Xiaonan Wei, Wenda |
| Author_xml | – sequence: 1 givenname: Nan surname: Jiang fullname: Jiang, Nan email: yuehs@jlu.edu.cn organization: Jilin University,College of Computer Science and Technology – sequence: 2 givenname: Hengshan surname: Yue fullname: Yue, Hengshan email: weixh@jlu.edu.cn organization: Jilin University,College of Computer Science and Technology – sequence: 3 givenname: Jingweijia surname: Tan fullname: Tan, Jingweijia organization: Jilin University,College of Computer Science and Technology – sequence: 4 givenname: Mengting surname: Zhou fullname: Zhou, Mengting organization: Jilin University,College of Computer Science and Technology – sequence: 5 givenname: Xiaonan surname: Wang fullname: Wang, Xiaonan organization: Jilin University,College of Computer Science and Technology – sequence: 6 givenname: Yuchun surname: Wang fullname: Wang, Yuchun organization: Jilin University,College of Computer Science and Technology – sequence: 7 givenname: Wenda surname: Wei fullname: Wei, Wenda organization: Jilin University,College of Computer Science and Technology – sequence: 8 givenname: Meikang surname: Qiu fullname: Qiu, Meikang organization: Augusta University,School of Computer and Cyber Science – sequence: 9 givenname: Xiaohui surname: Wei fullname: Wei, Xiaohui organization: Jilin University,College of Computer Science and Technology |
| BookMark | eNo1j11LwzAYhSPohc79A5H8gc4kb9ok3o261sLQXrjrkaZ5NbqlI62I_37Dj6sDh-c8cK7IeRyiJ-SWswXnzNw9LMsCtDQLwUR-qjiAkPyMzI0yGoDnDJjUl-SpTvbwVjX3dBnpCjG44ONEK_u5m2gT372bwhBplezefw3pg-KQ6M-GtmlwfhxDfKUnom7rdjNekwu0u9HP_3JGNtXqpXzM1s91Uy7XmeXKTJnAnlnuGbgOpJOGdcwq2XsonAKmQHWdRtSes77HQitjc6GRIS-cyDtEmJGbX2_w3m8PKext-t7-34Qj6j9LxQ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/DAC63849.2025.11133241 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331503048 |
| EndPage | 7 |
| ExternalDocumentID | 11133241 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 – fundername: Jilin University funderid: 10.13039/501100004032 – fundername: National Key Research and Development Program of China funderid: 10.13039/501100012166 |
| GroupedDBID | 6IE 6IH CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a179t-2fd0a1e03cb34c490b0a74de36c730737bb8ff8e10ddf6879a528f0f16c25bff3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 01 07:05:15 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a179t-2fd0a1e03cb34c490b0a74de36c730737bb8ff8e10ddf6879a528f0f16c25bff3 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_11133241 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-22 |
| PublicationDateYYYYMMDD | 2025-06-22 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 62nd ACM/IEEE Design Automation Conference (DAC) |
| PublicationTitleAbbrev | DAC |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 2.295365 |
| Snippet | As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Accuracy Excavation Finance Fraud Merging Power system reliability Real-time systems Reliability engineering Resilience Topology |
| Title | GraphFI: An Efficient Fault Injection Framework for Graph Processing on GPGPUs |
| URI | https://ieeexplore.ieee.org/document/11133241 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECbaePCkxhrf4eCVlmXZBbw1tVubmGYP1vTWLDAkGrM17dbfL9CtxoMHb4TMQDIwmWGYbwahO8icTJ3jxFvTinCjKdFScWK4N84MlFYxdPHyJKZTOZ-rsgWrRywMAMTkM-iFYfzLt0uzCaGyfmiL7h0A_9jZF0JswVot6jehqv8wGPrbxAP8hGW9HfGvtinRahRH_9zvGHV_8He4_LYsJ2gP6lM0HYfa0sXkHg9qPIqVHzw3LqrNe4Mn9VvMqapxscu2wt4dxZEHt2gAvxb2FONyXM7WXTQrRs_DR9J2QyCVV5qGMGdplQBNjU654YpqWgluIc2NCIoqtJbOSUiotS6XQlUZk466JDcs086lZ6hTL2s4R1jmnBrGtMjAcuuYyp3lRnrfhTEFTl2gbhDG4mNb8GKxk8PlH_NX6DCIPGRQMXaNOs1qAzfowHw2r-vVbTymLzWQk_k |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7QQIITIIZ4kwPXbmmatgm3adBtYlQ9bGi3qUkcCYQ6tHX8fpKsA3HgwC2K7ERyYtlx_NkI3UFseGQMC6w1LQOmJAkkFyxQzBpnCkIKH7p4Gad5zmczUTRgdY-FAQCffAYdN_R_-Xqh1i5U1nVt0a0DYB87uzFjNNzAtRrcb0hE96HXt_eJOQAKjTtb8l-NU7zdyA7_ueMRav8g8HDxbVuO0Q5UJygfuOrS2ege9yr86Gs_WG6clev3Go-qN59VVeFsm2-FrUOKPQ9u8AB2LWwpBsWgmK7aaJo9TvrDoOmHEJRWbeqAGk3KEEikZMQUE0SSMmUaokSlTlVTKbkxHEKitUl4KsqYckNMmCgaS2OiU9SqFhWcIcwTRhSlMo1BM22oSIxmilvvhVIBRpyjthPG_GNT8mK-lcPFH_O3aH84eR7Px6P86RIdOPG7fCpKr1CrXq7hGu2pz_p1tbzxR_YFGVeXQA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=GraphFI%3A+An+Efficient+Fault+Injection+Framework+for+Graph+Processing+on+GPGPUs&rft.au=Jiang%2C+Nan&rft.au=Yue%2C+Hengshan&rft.au=Tan%2C+Jingweijia&rft.au=Zhou%2C+Mengting&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133241&rft.externalDocID=11133241 |