A Scalable Checkpoint Encoding Algorithm for Diskless Checkpointing
Diskless checkpointing is an efficient technique to save the state of a long running application in a distributed environment without relying on stable storage. In this paper, we introduce several scalable encoding strategies into diskless checkpointing and reduce the overhead to survive k failures...
Uložené v:
| Vydané v: | 2008 11th IEEE High Assurance Systems Engineering Symposium s. 71 - 79 |
|---|---|
| Hlavní autori: | , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.12.2008
|
| Predmet: | |
| ISBN: | 0769534821, 9780769534824 |
| ISSN: | 1530-2059 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Diskless checkpointing is an efficient technique to save the state of a long running application in a distributed environment without relying on stable storage. In this paper, we introduce several scalable encoding strategies into diskless checkpointing and reduce the overhead to survive k failures in p processes from 2[logp].k((beta + 2gamma)m + alpha) to (1 + O(1/radic(m))).k(beta + 2gamma)m, where a is the communication latency, 1/beta is the network bandwidth between processes, 1/gamma is the rate to perform calculations, and m is the size of local checkpoint per process. The introduced algorithm is scalable in the sense that the overhead to survive k failures in p processes does not increase as the number of processes p increases. We evaluate the performance overhead of the introduced algorithm by using a preconditioned conjugate gradient equation solver as an example. Experimental results demonstrate that the introduced techniques are highly scalable. |
|---|---|
| ISBN: | 0769534821 9780769534824 |
| ISSN: | 1530-2059 |
| DOI: | 10.1109/HASE.2008.13 |

