Lessons Learned from Memory Errors Observed Over the Lifetime of Cielo
Maintaining the performance of high-performance computing (HPC) applications as failures increase is a major challenge for next-generation extreme-scale systems. Recent work demonstrates that hardware failures are expected to become more common. Few existing studies, however, have examined failures...
Saved in:
| Published in: | SC18: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 554 - 565 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.11.2018
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!