A low-overhead recovery technique using quasi-synchronous checkpointing

In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progressi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the 16th International Conference on Distributed Computing Systems S. 100 - 107
Hauptverfasser: Manivannan, D., Singhal, M.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 1996
Schlagworte:
ISBN:9780818673993, 0818673990
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easiness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation.
AbstractList In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easiness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation.
Author Manivannan, D.
Singhal, M.
Author_xml – sequence: 1
  givenname: D.
  surname: Manivannan
  fullname: Manivannan, D.
  organization: Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
– sequence: 2
  givenname: M.
  surname: Singhal
  fullname: Singhal, M.
  organization: Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
BookMark eNotj9FKwzAUhgMqqLMPoFd5gc6TpmlyLkfVORh4oV6PND210ZluTav07TeZV98PH_zwXbPz0AVi7FbAXAjA-1X5UL7OBWIxV6ARijOWoDZghCm0RJSXLInxEwAEFghKXLHlgm-737T7ob4lW_Oe3N-e-ECuDX4_Eh-jDx98P9ro0zgF1_Zd6MbIXUvua9f5MBz9Dbto7DZS8s8Ze396fCuf0_XLclUu1qkXkA8p1rUCkIB1JlWFDTREjXHgMlHVRitpILeAQmeaKtC6ySxpyo4JmAvlQM7Y3enXE9Fm1_tv20-bU608AB6lTQQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICDCS.1996.507906
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 107
ExternalDocumentID 507906
GroupedDBID 6IE
6IK
6IL
AAJGR
AAWTH
ACGHX
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
OCL
RIE
RIL
ID FETCH-LOGICAL-i104t-9dd500309d235b9f0feef8c0c21bd8753804a091727eb077f2ae7e28089415c03
IEDL.DBID RIE
ISBN 9780818673993
0818673990
ISICitedReferencesCount 78
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=507906&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Tue Aug 26 17:13:37 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i104t-9dd500309d235b9f0feef8c0c21bd8753804a091727eb077f2ae7e28089415c03
PageCount 8
ParticipantIDs ieee_primary_507906
PublicationCentury 1900
PublicationDate 19960000
PublicationDateYYYYMMDD 1996-01-01
PublicationDate_xml – year: 1996
  text: 19960000
PublicationDecade 1990
PublicationTitle Proceedings of the 16th International Conference on Distributed Computing Systems
PublicationTitleAbbrev ICDCS
PublicationYear 1996
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001969051
Score 1.5693327
Snippet In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves...
SourceID ieee
SourceType Publisher
StartPage 100
SubjectTerms Checkpointing
Degradation
Protocols
Title A low-overhead recovery technique using quasi-synchronous checkpointing
URI https://ieeexplore.ieee.org/document/507906
WOSCitedRecordID wos507906&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8MgGCa6ePA0nTN-h4NXNkrbUY5mOvWyLFGT3RYKL6bRtHNtNfv3Aq1bTLx4Aw6EvATe7-dB6JpDEJpEUMIYWAdFhppIYd9VkkaRUTEFKaQnm-DTaTKfi1mLs-17YQDAF5_BwA19Ll8XqnahsqG1XYSD197lnDetWttwihg5pCmP8Ogw2qzebRGdNvOwTWoGVAwfx7fjJ9epNxo0m_4iV_G6ZdL916kOUH_bo4dnG-1ziHYg76HuD0kDbt_sEbq_we_FF3GVmvbf1dh5wHa8xhv0Vuxq31_xRy3LjJTrXDm83KIusb1P9bYsMk8m0Ucvk7vn8QNp2RNIZl2sigitY5_o1CyMU2GoATCJoooFqXZeSkIjaa0Fa8BASjk3TAIHZuUlrFJXNDxGnbzI4QThAKI05CY2mseRBCVoaoyJKCjGlNbqFPWcWBbLBiBj0Ujk7M_Vc7TfFD67KMYF6lSrGi7RnvqssnJ15S_1G-6WoLo
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8IwGG6MmugJRYzf9uB10HUdW48GRYhISMSEG-nat4ZoNmRMw7-37SbExIu3toemeZv2_X4ehG4i8AMdc-JRCsZBEYHyBDfvKk4Y0zIkILhwZBPRcBhPJnxU4Wy7XhgAcMVn0LRDl8tXmSxsqKxlbBdu4bV3QsaoXzZrbQIqvG2xphzGo0VpM5q3wnRaz4MqrekT3up37jrPtlev3Sy3_UWv4rRLt_avcx2gxqZLD4_W-ucQbUFaR7UfmgZcvdoj9HCL37Mvz9Zqmp9XYesDm_EKr_Fbsa1-f8UfhchnXr5KpUXMzYocmxuVb_Ns5ugkGuilez_u9LyKP8GbGSdr6XGlQpfqVDQIE66JBtCxJJL6ibJ-SkyYMPaCMWEgIVGkqYAIqJEXN2pdkuAYbadZCicI-8CSINKhVlHIBEhOEq01IyAplUrJU1S3YpnOS4iMaSmRsz9Xr9Feb_w0mA76w8dztF-WQduYxgXaXi4KuES78nM5yxdX7oK_AXtkpAE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+16th+International+Conference+on+Distributed+Computing+Systems&rft.atitle=A+low-overhead+recovery+technique+using+quasi-synchronous+checkpointing&rft.au=Manivannan%2C+D.&rft.au=Singhal%2C+M.&rft.date=1996-01-01&rft.pub=IEEE&rft.isbn=9780818673993&rft.spage=100&rft.epage=107&rft_id=info:doi/10.1109%2FICDCS.1996.507906&rft.externalDocID=507906
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/sc.gif&client=summon&freeimage=true