How to recover efficiently and asynchronously when optimism fails

We propose a new algorithm for recovering asynchronously from failures in a distributed computation. Our algorithm is based on two novel concepts-a fault-tolerant vector clock to maintain causality information in spite of failures, and a history mechanism to detect orphan states and obsolete message...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the 16th International Conference on Distributed Computing Systems S. 108 - 115
Hauptverfasser: Damani, O.P., Garg, V.K.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 1996
Schlagworte:
ISBN:9780818673993, 0818673990
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract We propose a new algorithm for recovering asynchronously from failures in a distributed computation. Our algorithm is based on two novel concepts-a fault-tolerant vector clock to maintain causality information in spite of failures, and a history mechanism to detect orphan states and obsolete messages. These two mechanisms together with checkpointing and message-logging are used to restore the system to a consistent state after a failure of one or more processes. Our algorithm is completely asynchronous. It handles multiple failures, does not assume any message ordering, causes the minimum amount of rollback and restores the maximum recoverable state with low overhead. Earlier optimistic protocols lack one or more of the above properties.
AbstractList We propose a new algorithm for recovering asynchronously from failures in a distributed computation. Our algorithm is based on two novel concepts-a fault-tolerant vector clock to maintain causality information in spite of failures, and a history mechanism to detect orphan states and obsolete messages. These two mechanisms together with checkpointing and message-logging are used to restore the system to a consistent state after a failure of one or more processes. Our algorithm is completely asynchronous. It handles multiple failures, does not assume any message ordering, causes the minimum amount of rollback and restores the maximum recoverable state with low overhead. Earlier optimistic protocols lack one or more of the above properties.
Author Damani, O.P.
Garg, V.K.
Author_xml – sequence: 1
  givenname: O.P.
  surname: Damani
  fullname: Damani, O.P.
  organization: Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
– sequence: 2
  givenname: V.K.
  surname: Garg
  fullname: Garg, V.K.
BookMark eNotj8tKxDAUQAMqqGM_QFf5gdabV9Msh_qYgQEX6npIk1sm0iZDUx369wrj6sBZHDi35DKmiITcM6gYA_O4bZ_a94oZU1cKtAF9QQqjG2hYU2thjLgmRc5fAMBMbUCxG7LepBOdE53QpR-cKPZ9cAHjPCzURk9tXqI7TCmm7_ynTgeMNB3nMIY80t6GId-Rq94OGYt_rsjny_NHuyl3b6_bdr0rAwM5l9JYZ7hrNAihvEDHJHjfdZxBrbyy2CvLpfXacNmw3nOlmXaAteKdxA7EijycuwER98cpjHZa9udP8Qvjqkrh
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICDCS.1996.507907
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 115
ExternalDocumentID 507907
GroupedDBID 6IE
6IK
6IL
AAJGR
AAWTH
ACGHX
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
OCL
RIE
RIL
ID FETCH-LOGICAL-i104t-49ac92c870335d3ec140ddbb21065d5aef5a24ad792481fd25717c0e652b4eb03
IEDL.DBID RIE
ISBN 9780818673993
0818673990
ISICitedReferencesCount 35
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=507907&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Tue Aug 26 17:13:37 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i104t-49ac92c870335d3ec140ddbb21065d5aef5a24ad792481fd25717c0e652b4eb03
PageCount 8
ParticipantIDs ieee_primary_507907
PublicationCentury 1900
PublicationDate 19960000
PublicationDateYYYYMMDD 1996-01-01
PublicationDate_xml – year: 1996
  text: 19960000
PublicationDecade 1990
PublicationTitle Proceedings of the 16th International Conference on Distributed Computing Systems
PublicationTitleAbbrev ICDCS
PublicationYear 1996
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001969051
Score 1.473742
Snippet We propose a new algorithm for recovering asynchronously from failures in a distributed computation. Our algorithm is based on two novel concepts-a...
SourceID ieee
SourceType Publisher
StartPage 108
SubjectTerms Checkpointing
Clocks
Costs
Distributed computing
Fault detection
Fault tolerance
History
Protocols
Title How to recover efficiently and asynchronously when optimism fails
URI https://ieeexplore.ieee.org/document/507907
WOSCitedRecordID wos507907&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmAqlCK-5YE1bRMndjyiQlWWqhIgdav8cRGVSoKalKr_nrMTWiGxsCUeLMv2-e6d_d4Rcs-EdahBB2GWxUHMNEebi6Mg49YaQJfJjPLFJsRkks5mctrobHsuDAD4x2fQc5_-Lt8WZu1SZX2MXaRjjh8KIWqq1j6dIrlTmvIKj06jDf1uo-i0-2fNpWY4kP3n4ePwxTH1eK_u9FdxFe9bRu1_jeqEdPccPTrdeZ9TcgB5h7R_ijTQxmbPyMO42NCqoA754ral4DUjsMfllqrcUlVuc-MUcot1iU2bd8hpgecIrv8HzdRiWXbJ2-jpdTgOmroJwQLBVRXEUhkZGbRExhLLwCCIslZrRHc8sYmCLFFRrKxA7JWGmUWrDYUZAE8iHYMesHPSyoscLgiNLEgMspwIHY9Do7XkoWJJaFLmQjdxSTpuQuaftTTGvJ6Lqz9br8lx_eTZ5S9uSKtareGWHJmvalGu7vxyfgPvZZyP
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0MmugJRYzf9uB1gW67XXo0KIGIhERMuJF-bSTBXcOChH_vtLtCTLx46_bQbNpOZ9607w1C9zQ2DjWogCQJCxhVHGyOhUHCjdEWXCbV0hebiIfD9mQiRqXOtufCWGv94zPbcE1_l28yvXKpsibELsIxx_cjxkJSkLV2CRXBndaU13h0Km3geUtNp-03La81SUs0-53Hzqvj6vFGMeyv8ireu3Sr__qvY1TfsfTwaOt_TtCeTWuo-lOmAZdWe4oeetkaLzPssC9sXGy9agSMON9gmRos802qnUZutsqha_1uU5zBSQI74AMncjbP6-it-zTu9IKyckIwA3i1DJiQWoQabJHSyFCrAUYZoxTgOx6ZSNokkiGTJgb01SaJAbslsW5ZHoWKWdWiZ6iSZqk9Rzg0VkCY5WToOCNaKcGJpBHRbeqCt_gC1dyETD8LcYxpMReXf_beocPe-GUwHfSHz1foqHgA7bIZ16iyXKzsDTrQX8tZvrj1S_sNSA2f1g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+16th+International+Conference+on+Distributed+Computing+Systems&rft.atitle=How+to+recover+efficiently+and+asynchronously+when+optimism+fails&rft.au=Damani%2C+O.P.&rft.au=Garg%2C+V.K.&rft.date=1996-01-01&rft.pub=IEEE&rft.isbn=9780818673993&rft.spage=108&rft.epage=115&rft_id=info:doi/10.1109%2FICDCS.1996.507907&rft.externalDocID=507907
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/sc.gif&client=summon&freeimage=true