REFINE realistic fault injection via compiler-based instrumentation for accuracy, portability and speed

Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, bina...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 14
Hlavní autoři: Georgakoudis, Giorgis, Laguna, Ignacio, Nikolopoulos, Dimitrios S., Schulz, Martin
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: New York, NY, USA ACM 12.11.2017
Edice:ACM Conferences
Témata:
ISBN:9781450351140, 145035114X
ISSN:2167-4337
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.
AbstractList Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.
Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of the-art compiler-based FI frameworks, reducing the time for large FI experiments. CCS CONCEPTS * Computing methodologies → Simulation tools; Model verification and validation; * Software and its engineering → Compilers; * Hardware → Analysis and design of emerging devices and systems;
Author Schulz, Martin
Laguna, Ignacio
Georgakoudis, Giorgis
Nikolopoulos, Dimitrios S.
Author_xml – sequence: 1
  givenname: Giorgis
  surname: Georgakoudis
  fullname: Georgakoudis, Giorgis
  email: g.georgakoudis@qub.ac.uk
  organization: Queen's University Belfast, Belfast, United Kingdom
– sequence: 2
  givenname: Ignacio
  surname: Laguna
  fullname: Laguna, Ignacio
  email: ilaguna@llnl.gov
  organization: Lawrence Livermore National Laboratory
– sequence: 3
  givenname: Dimitrios S.
  surname: Nikolopoulos
  fullname: Nikolopoulos, Dimitrios S.
  email: d.nikolopoulos@qub.ac.uk
  organization: Queen's University Belfast, Belfast, United Kingdom
– sequence: 4
  givenname: Martin
  surname: Schulz
  fullname: Schulz, Martin
  email: schulzm@in.tum.de
  organization: Lawrence Livermore National Laboratory and Technische Universität München, Munich, Germany
BookMark eNqNjztPwzAURs1TlJKZgT_AknB9_cyIqhQqVa2Eult2fC1F0AYlLPx7DM3E1OkbzqcjnVt2eegPxNg9h4pzqZ4ER12Drf7W4BkramMzAKEyh3M2Q65NKYUwF__YDSvGsQuggGuuJMzY9VuzXG2aO3aV_MdIxbRztls2u8Vrud6-rBbP69ILbr5KQhl0rFHryNGmNlLANspQJ-uTTjHKRADCSpvIeIRgNJogYushaBXFnD0ctR0Ruc-h2_vh29VZiMpk-nikvt270Pfvo-Pgfpvd1Oym5nytTry6MHSUxA-6vFEY
ContentType Conference Proceeding
Copyright 2017 ACM
Copyright_xml – notice: 2017 ACM
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3126908.3126972
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450351140
145035114X
EISSN 2167-4337
EndPage 14
ExternalDocumentID 9926257
Genre orig-research
GrantInformation_xml – fundername: European Commission
  funderid: 10.13039/501100000780
– fundername: U.S. Department of Energy
  funderid: 10.13039/100000015
GroupedDBID 6IE
6IF
6IL
6IN
ABLEC
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
OCL
RIB
RIC
RIE
RIL
6IH
6IK
AAWTH
ADZIZ
CHZPO
IPLJI
ID FETCH-LOGICAL-a317t-e24b6d9266d128fcdeb2cd4b9f8af6fdd4fe003848fe7a20b7627b3dca0b65d3
IEDL.DBID RIE
ISBN 9781450351140
145035114X
ISICitedReferencesCount 26
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000458161700029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:19:15 EDT 2025
Wed Jan 31 06:44:57 EST 2024
Wed Jan 31 06:44:12 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Keywords fault injection
compiler-based instrumentation
resilience
high-performance computing
Language English
License Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org
LinkModel DirectLink
MeetingName SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis
MergedId FETCHMERGED-LOGICAL-a317t-e24b6d9266d128fcdeb2cd4b9f8af6fdd4fe003848fe7a20b7627b3dca0b65d3
PageCount 14
ParticipantIDs ieee_primary_9926257
acm_books_10_1145_3126908_3126972
acm_books_10_1145_3126908_3126972_brief
PublicationCentury 2000
PublicationDate 20171112
2017-Nov.-12
PublicationDateYYYYMMDD 2017-11-12
PublicationDate_xml – month: 11
  year: 2017
  text: 20171112
  day: 12
PublicationDecade 2010
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle International Conference for High Performance Computing, Networking, Storage and Analysis (Online)
PublicationTitleAbbrev SC
PublicationYear 2017
Publisher ACM
Publisher_xml – name: ACM
SSID ssib050161540
ssj0003204180
Score 1.8076944
Snippet Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems....
SourceID ieee
acm
SourceType Publisher
StartPage 1
SubjectTerms Analytical models
Codes
Compiler-based Instrumentation
Computational modeling
Computing methodologies -- Modeling and simulation -- Model development and analysis -- Model verification and validation
Computing methodologies -- Modeling and simulation -- Simulation support systems -- Simulation tools
Fault Injection
Hardware -- Emerging technologies -- Analysis and design of emerging devices and systems
High-Performance Computing
Instruments
Performance evaluation
Program processors
Resilience
Runtime
Software and its engineering -- Software notations and tools -- Compilers
Subtitle realistic fault injection via compiler-based instrumentation for accuracy, portability and speed
Title REFINE
URI https://ieeexplore.ieee.org/document/9926257
WOSCitedRecordID wos000458161700029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La8JAEB5UeujJtlpqX2yh0IvRuNlskt5KMdSLiPXgLeyTWtoo8QH---6u0VIolJ4SwgaGnck8svN9A3CvCaOSMuLFhBBToJDY4wGhXqAoj6SWOmS-GzYRDYfxdJqMKtA-YGGUUq75THXsrTvLl3Oxtr_KuokltwujKlSjKNphtfa2E7rUpeQtsV44wD7pxX7J5tMjYTfoYVMKxh13tYzAVSY-fwxVcTElrf9PmhNofoPz0OgQdk6hovIzqO-nM6DyY23A27ifDob9RzRWluTQGAhK2fpjhQb5u-u_ytFmxpB90XiGwrPxTKKBY5T9LBFJOTI5LXoSYl0wsW0j13fq2mm3iOUSvS6MFE2YpP3J84tXzlXwmMkWVp7ChFNppKfSRCctpKmuhSQ80THTVEtJtLInhiTWKmLY58ZhRjyQgvmchjI4h1o-z9UFIFPKmuxcaMItuJnzWAqBuQwT7VOMBW3BndnjzNYLy2wHgQ6zUg9ZqYcWPPy5JuPFTOkWNKwWssWOhyMrFXD5--MrOMY2BNuWPXwNNbN56gaOxGY1Wxa3zni-ACTUwI4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1ZS8QwEB68QJ-8VlzPCIIvdm3TNG19E9liUYvoPvgWcuKKW2Uv8N-bZKsiCOJTS0lhyEznaOb7BuDYEE4V5STICCG2QCFZIGJCg1hTkSqjTMJDP2wirars8TG_m4PTLyyM1to3n-mOu_Vn-epVTtyvsrPckdsl6TwsJoTgaIbW-rSexCcvDXOJ88MxDkmUhQ2fT0SSszjCthjMOv7qOIHnuRz8GKvio0qx-j951qD1Dc9Dd1-BZx3mdL0Bq5_zGVDzuW7C0323KKvuObrXjubQmggq-ORljMr62Xdg1Wja58i9aH3DMHARTaHSc8oOGkxSjWxWiy6knAy5fD9FvvPUN9S-I14r9PBmpWhBr-j2Lq-CZrJCwG2-MA40JoIqKz1VNj4ZqWx9LRURucm4oUYpYrQ7MySZ0SnHobAuMxWxkjwUNFHxFizUr7XeBmSLWZufS0OEgzcLkSkpsVBJbkKKsaRtOLJ7zFzFMGIzEHTCGj2wRg9tOPlzDRPDvjZt2HRaYG8zJg7WKGDn98eHsHzVu71hN2V1vQsr2AVk18CH92DBbqTehyU5HfdHwwNvSB9WecPV
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=REFINE&rft.au=Georgakoudis%2C+Giorgis&rft.au=Laguna%2C+Ignacio&rft.au=Nikolopoulos%2C+Dimitrios+S.&rft.au=Schulz%2C+Martin&rft.series=ACM+Conferences&rft.date=2017-11-12&rft.pub=ACM&rft.isbn=9781450351140&rft.spage=1&rft.epage=14&rft_id=info:doi/10.1145%2F3126908.3126972
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/sc.gif&client=summon&freeimage=true