A low-overhead recovery technique using quasi-synchronous checkpointing
In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progressi...
Gespeichert in:
| Veröffentlicht in: | Proceedings of the 16th International Conference on Distributed Computing Systems S. 100 - 107 |
|---|---|
| Hauptverfasser: | , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
1996
|
| Schlagworte: | |
| ISBN: | 9780818673993, 0818673990 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easiness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation. |
|---|---|
| AbstractList | In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easiness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation. |
| Author | Manivannan, D. Singhal, M. |
| Author_xml | – sequence: 1 givenname: D. surname: Manivannan fullname: Manivannan, D. organization: Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA – sequence: 2 givenname: M. surname: Singhal fullname: Singhal, M. organization: Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA |
| BookMark | eNotj9FKwzAUhgMqqLMPoFd5gc6TpmlyLkfVORh4oV6PND210ZluTav07TeZV98PH_zwXbPz0AVi7FbAXAjA-1X5UL7OBWIxV6ARijOWoDZghCm0RJSXLInxEwAEFghKXLHlgm-737T7ob4lW_Oe3N-e-ECuDX4_Eh-jDx98P9ro0zgF1_Zd6MbIXUvua9f5MBz9Dbto7DZS8s8Ze396fCuf0_XLclUu1qkXkA8p1rUCkIB1JlWFDTREjXHgMlHVRitpILeAQmeaKtC6ySxpyo4JmAvlQM7Y3enXE9Fm1_tv20-bU608AB6lTQQ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICDCS.1996.507906 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 107 |
| ExternalDocumentID | 507906 |
| GroupedDBID | 6IE 6IK 6IL AAJGR AAWTH ACGHX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIE RIL |
| ID | FETCH-LOGICAL-i104t-9dd500309d235b9f0feef8c0c21bd8753804a091727eb077f2ae7e28089415c03 |
| IEDL.DBID | RIE |
| ISBN | 9780818673993 0818673990 |
| ISICitedReferencesCount | 78 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=507906&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Tue Aug 26 17:13:37 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i104t-9dd500309d235b9f0feef8c0c21bd8753804a091727eb077f2ae7e28089415c03 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_507906 |
| PublicationCentury | 1900 |
| PublicationDate | 19960000 |
| PublicationDateYYYYMMDD | 1996-01-01 |
| PublicationDate_xml | – year: 1996 text: 19960000 |
| PublicationDecade | 1990 |
| PublicationTitle | Proceedings of the 16th International Conference on Distributed Computing Systems |
| PublicationTitleAbbrev | ICDCS |
| PublicationYear | 1996 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0001969051 |
| Score | 1.5693327 |
| Snippet | In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 100 |
| SubjectTerms | Checkpointing Degradation Protocols |
| Title | A low-overhead recovery technique using quasi-synchronous checkpointing |
| URI | https://ieeexplore.ieee.org/document/507906 |
| WOSCitedRecordID | wos507906&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8MgGCa6ePA0nTN-h4NXNkrbUY5mOvWyLFGT3RYKL6bRtHNtNfv3Aq1bTLx4Aw6EvATe7-dB6JpDEJpEUMIYWAdFhppIYd9VkkaRUTEFKaQnm-DTaTKfi1mLs-17YQDAF5_BwA19Ll8XqnahsqG1XYSD197lnDetWttwihg5pCmP8Ogw2qzebRGdNvOwTWoGVAwfx7fjJ9epNxo0m_4iV_G6ZdL916kOUH_bo4dnG-1ziHYg76HuD0kDbt_sEbq_we_FF3GVmvbf1dh5wHa8xhv0Vuxq31_xRy3LjJTrXDm83KIusb1P9bYsMk8m0Ucvk7vn8QNp2RNIZl2sigitY5_o1CyMU2GoATCJoooFqXZeSkIjaa0Fa8BASjk3TAIHZuUlrFJXNDxGnbzI4QThAKI05CY2mseRBCVoaoyJKCjGlNbqFPWcWBbLBiBj0Ujk7M_Vc7TfFD67KMYF6lSrGi7RnvqssnJ15S_1G-6WoLo |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8IwGG6MmugJRYzf9uB10HUdW48GRYhISMSEG-nat4ZoNmRMw7-37SbExIu3toemeZv2_X4ehG4i8AMdc-JRCsZBEYHyBDfvKk4Y0zIkILhwZBPRcBhPJnxU4Wy7XhgAcMVn0LRDl8tXmSxsqKxlbBdu4bV3QsaoXzZrbQIqvG2xphzGo0VpM5q3wnRaz4MqrekT3up37jrPtlev3Sy3_UWv4rRLt_avcx2gxqZLD4_W-ucQbUFaR7UfmgZcvdoj9HCL37Mvz9Zqmp9XYesDm_EKr_Fbsa1-f8UfhchnXr5KpUXMzYocmxuVb_Ns5ugkGuilez_u9LyKP8GbGSdr6XGlQpfqVDQIE66JBtCxJJL6ibJ-SkyYMPaCMWEgIVGkqYAIqJEXN2pdkuAYbadZCicI-8CSINKhVlHIBEhOEq01IyAplUrJU1S3YpnOS4iMaSmRsz9Xr9Feb_w0mA76w8dztF-WQduYxgXaXi4KuES78nM5yxdX7oK_AXtkpAE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+16th+International+Conference+on+Distributed+Computing+Systems&rft.atitle=A+low-overhead+recovery+technique+using+quasi-synchronous+checkpointing&rft.au=Manivannan%2C+D.&rft.au=Singhal%2C+M.&rft.date=1996-01-01&rft.pub=IEEE&rft.isbn=9780818673993&rft.spage=100&rft.epage=107&rft_id=info:doi/10.1109%2FICDCS.1996.507906&rft.externalDocID=507906 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818673993/sc.gif&client=summon&freeimage=true |

