Modeling and Simulating Multiple Failure Masking Enabled by Local Recovery for Stencil-Based Applications at Extreme Scales

Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more tradi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems Jg. 28; H. 10; S. 2881 - 2895
Hauptverfasser: Gamell, Marc, Teranishi, Keita, Mayo, Jackson, Kolla, Hemanth, Heroux, Michael A., Chen, Jacqueline, Parashar, Manish
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.10.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1045-9219, 1558-2183
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!