A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC
Large-scale HPC systems experience failures arising from faults in hardware, software, and/or networking. Failure rates continue to grow as systems scale up and out. Crash fault tolerance has up to now been the focus when considering means to augment the Message Passing Interface (MPI) for fault-tol...
Saved in:
| Published in: | International journal of parallel programming Vol. 51; no. 2-3; pp. 128 - 149 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.06.2023
Springer Nature B.V |
| Subjects: | |
| ISSN: | 0885-7458, 1573-7640 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!