Optimization of checkpointing-related I/O for high-performance parallel and distributed computing

Checkpointing, the process of saving program/application state, usually to a stable storage, has been the most common fault-tolerance methodology for high-performance applications. The rate of checkpointing (how often) is primarily driven by the failure rate of the system. If the checkpointing rate...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing Vol. 46; no. 2; pp. 150 - 180
Main Authors: Subramaniyan, Rajagopal, Grobelny, Eric, Studham, Scott, George, Alan D.
Format: Journal Article
Language:English
Published: Boston Springer US 01.11.2008
Subjects:
ISSN:0920-8542, 1573-0484
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first