Checkpointing strategies for a fixed-length execution

This work considers checkpointing strategies for a parallel application executing on a large-scale platform whose nodes are subject to failures. The application executes for a fixed duration, namely the length of the reservation that it has been granted. We start with small examples that show the di...

Full description

Saved in:
Bibliographic Details
Published in:SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 508 - 518
Main Authors: Benoit, Anne, Perotin, Lucas, Robert, Yves, Vivien, Frederic
Format: Conference Proceeding
Language:English
Published: IEEE 17.11.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first