REFINE realistic fault injection via compiler-based instrumentation for accuracy, portability and speed
Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, bina...
Gespeichert in:
| Veröffentlicht in: | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) S. 1 - 14 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
New York, NY, USA
ACM
12.11.2017
|
| Schriftenreihe: | ACM Conferences |
| Schlagworte: | |
| ISBN: | 9781450351140, 145035114X |
| ISSN: | 2167-4337 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies.
We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments. |
|---|---|
| ISBN: | 9781450351140 145035114X |
| ISSN: | 2167-4337 |
| DOI: | 10.1145/3126908.3126972 |

