A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC

Large-scale HPC systems experience failures arising from faults in hardware, software, and/or networking. Failure rates continue to grow as systems scale up and out. Crash fault tolerance has up to now been the focus when considering means to augment the Message Passing Interface (MPI) for fault-tol...

Full description

Saved in:
Bibliographic Details
Published in:International journal of parallel programming Vol. 51; no. 2-3; pp. 128 - 149
Main Authors: Nansamba, Grace, Altarawneh, Amani, Skjellum, Anthony
Format: Journal Article
Language:English
Published: New York Springer US 01.06.2023
Springer Nature B.V
Subjects:
ISSN:0885-7458, 1573-7640
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first