A symmetric O(n log n) message distributed snapshot algorithm for large-scale systems

This paper presents a O(n log n) message distributed snapshot algorithm for a system with non-FIFO channels, where n is the number of processors. The algorithm finds applications for checkpointing in large scale supercomputers and distributed systems that have a fully connected logical topology over...

Full description

Saved in:
Bibliographic Details
Published in:2009 IEEE International Conference on Cluster Computing and Workshops pp. 1 - 4
Main Author: Kshemkalyani, A.D.
Format: Conference Proceeding
Language:English
Published: IEEE 01.08.2009
Subjects:
ISBN:9781424450114, 142445011X
ISSN:1552-5244
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a O(n log n) message distributed snapshot algorithm for a system with non-FIFO channels, where n is the number of processors. The algorithm finds applications for checkpointing in large scale supercomputers and distributed systems that have a fully connected logical topology over a large number of processors. Each processor sends log n messages in the algorithm. The sizes of the messages are geometrically distributed, and the sum of the sizes of the messages sent by any processor is n. The response time of the algorithm is O(log n). The algorithm is fully distributed and the role of each processor is symmetric, unlike tree-based, ring-based, and centralized algorithms.
ISBN:9781424450114
142445011X
ISSN:1552-5244
DOI:10.1109/CLUSTR.2009.5289139