REFINE realistic fault injection via compiler-based instrumentation for accuracy, portability and speed

Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, bina...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 14
Hlavní autoři: Georgakoudis, Giorgis, Laguna, Ignacio, Nikolopoulos, Dimitrios S., Schulz, Martin
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: New York, NY, USA ACM 12.11.2017
Edice:ACM Conferences
Témata:
ISBN:9781450351140, 145035114X
ISSN:2167-4337
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.
ISBN:9781450351140
145035114X
ISSN:2167-4337
DOI:10.1145/3126908.3126972