GraphFI: An Efficient Fault Injection Framework for Graph Processing on GPGPUs

As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specifi...

Full description

Saved in:
Bibliographic Details
Published in:2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors: Jiang, Nan, Yue, Hengshan, Tan, Jingweijia, Zhou, Mengting, Wang, Xiaonan, Wang, Yuchun, Wei, Wenda, Qiu, Meikang, Wei, Xiaohui
Format: Conference Proceeding
Language:English
Published: IEEE 22.06.2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As graph tasks become pervasive in real-time and safetycritical domains (e.g., financial fraud detection and electrical power systems), it is also essential to guarantee their reliable execution beyond pursuing extraordinary performance. However, due to the neglect of consideration for graph-specific execution paradigm, existing Fault Injection (FI) reliability analysis methods typically incur inaccurate system error resilience characterization, making it challenging to provide helpful guidance for efficient and reliable graph processing paradigm design. This paper proposes GraphFI, an efficient Graph Fault Injection framework on the universal parallel tasks acceleration platform (i.e., GPGPUs). Our key insight is progressively excavating the graph-specific error propagation and effect mechanisms, thereby avoiding blind FI trials. Firstly, observing that iterations with similar active vertex set exhibit similar error behavior, we propose iteration-driven GraphFI (ID-GraphFI) to solely select representative iterations for fast error resilience profile assessment. Secondly, by detecting resilience-similarity communities in graph topology, we propose topology-driven GraphFI (TD-GraphFI) that only selects representative vertices for community overall reliability evaluation. Thirdly, by exploring the graph-specific fault monotonic property, we propose the monotonicity-driven GraphFI (MD-GraphFI) to granularly draw system severe error boundaries for predictable/unnecessary fault injection avoidance. Merging them all, GraphFI can reduce system fault site space by up to two orders of magnitude, which achieves 2.1 \sim 15.2 \times speedup compared to SOTA methods while providing better reliability assessment accuracy.
DOI:10.1109/DAC63849.2025.11133241