Scalable I/O aggregation for asynchronous multi-level checkpointing
Checkpointing distributed HPC applications is a common I/O pattern with many use cases: resilience, job management, reproducibility, revisiting previous intermediate results, etc. This is a difficult pattern for a large number of processes that need to capture massive data sizes and write them persi...
Saved in:
| Published in: | Future generation computer systems Vol. 160; no. C; pp. 420 - 432 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Netherlands
Elsevier B.V
01.11.2024
Elsevier |
| Subjects: | |
| ISSN: | 0167-739X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!