Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems
Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged a...
Saved in:
| Published in: | IEEE transactions on parallel and distributed systems Vol. 24; no. 3; pp. 493 - 505 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
IEEE
01.03.2013
|
| Subjects: | |
| ISSN: | 1045-9219 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm is able to achieve better workload balance and less network congestion. Most importantly, based on our framework, we demonstrate that the minimum number of control messages required by a symmetrical global-snapshot algorithm is Ω(N log N), where N is the number of processes. Finally, we also assume non-FIFO channels. |
|---|---|
| AbstractList | Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm is able to achieve better workload balance and less network congestion. Most importantly, based on our framework, we demonstrate that the minimum number of control messages required by a symmetrical global-snapshot algorithm is Ω(N log N), where N is the number of processes. Finally, we also assume non-FIFO channels. |
| Author | Jichiang Tsai |
| Author_xml | – sequence: 1 givenname: Jichiang surname: Tsai fullname: Tsai, Jichiang |
| BookMark | eNp1kMFLwzAUh3OY4DY9evLSfyAzr2ma5jg2twkDhc5zSbuXLZK2I4ng_ns7Jh4ET-_yfT8e34SMur5DQh6AzQCYetq9LctZyiCdAVcjMgaWCapSULdkEsIHY5AJlo1JuXL4ZWuHSXluW4zeNtola9fX2tGy06dw7GMyd4fe23hsQ2J6n2y1PyAtBxKTpQ2DVH9G3A8TIWIb7siN0S7g_c-dkvfV826xodvX9ctivqVNKmSkoA1mkGmTDQ_vOW9Sk0ss9D4XkmkoCpMLlFIInjY1M4pzpnkuGTABxqiMTwm_7ja-D8GjqRobdbR9F722rgJWXVpUlxbVpUU1tBgs-sc6edtqf_6Xf7zyFhF_2RyUhCLl32tZbY0 |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1007_s00224_014_9599_8 crossref_primary_10_1016_j_jpdc_2013_09_009 crossref_primary_10_3390_s20226446 crossref_primary_10_1177_0037549713485499 crossref_primary_10_3390_electronics11071127 |
| Cites_doi | 10.1109/71.780865 10.1088/0967-1846/2/4/005 10.1145/357360.357365 10.1145/359545.359563 10.1109/TPDS.2010.24 10.1007/3-540-51687-5_50 10.1109/TPDS.2009.108 10.1007/BF01782776 10.1002/0471721271 10.1109/TSE.2007.1000 10.1145/214451.214456 10.1145/1006209.1006248 10.1016/0020-0190(87)90125-6 10.1109/IPDPS.2011.56 10.1145/568522.568525 10.1007/3-540-51687-5_37 10.1145/93385.93398 10.1006/jpdc.1993.1075 10.1109/32.263754 10.1109/TSE.1987.232562 10.1016/j.jpdc.2008.08.003 10.1109/71.798312 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TPDS.2012.139 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library Online CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EndPage | 505 |
| ExternalDocumentID | 10_1109_TPDS_2012_139 6197182 |
| Genre | orig-research |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYXX CITATION |
| ID | FETCH-LOGICAL-c257t-1afe414af4109d33c2f67e8ad6570a188f65e775532cb0f9330a36701051ff943 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 6 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000313816300007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Sat Nov 29 08:08:58 EST 2025 Tue Nov 18 22:26:32 EST 2025 Wed Aug 27 02:52:20 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c257t-1afe414af4109d33c2f67e8ad6570a188f65e775532cb0f9330a36701051ff943 |
| PageCount | 13 |
| ParticipantIDs | crossref_citationtrail_10_1109_TPDS_2012_139 crossref_primary_10_1109_TPDS_2012_139 ieee_primary_6197182 |
| PublicationCentury | 2000 |
| PublicationDate | 2013-03-01 |
| PublicationDateYYYYMMDD | 2013-03-01 |
| PublicationDate_xml | – month: 03 year: 2013 text: 2013-03-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2013 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref13 ref12 ref23 ref15 ref14 ref20 Janssens (ref18) ref11 ref22 Grama (ref24) 2003 ref10 ref21 ref2 ref1 ref17 ref16 ref19 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref19 doi: 10.1109/71.780865 – ident: ref14 doi: 10.1088/0967-1846/2/4/005 – ident: ref4 doi: 10.1145/357360.357365 – ident: ref17 doi: 10.1145/359545.359563 – volume-title: Introduction to Parallel Computing year: 2003 ident: ref24 – ident: ref16 doi: 10.1109/TPDS.2010.24 – ident: ref9 doi: 10.1007/3-540-51687-5_50 – ident: ref15 doi: 10.1109/TPDS.2009.108 – ident: ref3 doi: 10.1007/BF01782776 – start-page: 505 volume-title: Proc. Int’l Conf. Parallel Processing ident: ref18 article-title: Experimental Evaluation of Multiprocessor Cache-Based Error Recovery – ident: ref23 doi: 10.1002/0471721271 – ident: ref6 doi: 10.1109/TSE.2007.1000 – ident: ref1 doi: 10.1145/214451.214456 – ident: ref7 doi: 10.1145/1006209.1006248 – ident: ref12 doi: 10.1016/0020-0190(87)90125-6 – ident: ref22 doi: 10.1109/IPDPS.2011.56 – ident: ref8 doi: 10.1145/568522.568525 – ident: ref11 doi: 10.1007/3-540-51687-5_37 – ident: ref10 doi: 10.1145/93385.93398 – ident: ref13 doi: 10.1006/jpdc.1993.1075 – ident: ref5 doi: 10.1109/32.263754 – ident: ref2 doi: 10.1109/TSE.1987.232562 – ident: ref21 doi: 10.1016/j.jpdc.2008.08.003 – ident: ref20 doi: 10.1109/71.798312 |
| SSID | ssj0014504 |
| Score | 2.0611217 |
| Snippet | Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes.... |
| SourceID | crossref ieee |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 493 |
| SubjectTerms | Algorithm design and analysis checkpointing Complexity theory Distributed systems global snapshots Hypercubes message passing Process control process symmetry Program processors Time factors Vectors |
| Title | Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems |
| URI | https://ieeexplore.ieee.org/document/6197182 |
| Volume | 24 |
| WOSCitedRecordID | wos000313816300007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore Digital Library issn: 1045-9219 databaseCode: RIE dateStart: 19900101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://ieeexplore.ieee.org/ omitProxy: false ssIdentifier: ssj0014504 providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG6QeNCDKGjEX-nBeKKwX6XrkYjEAyEkw4Tb0m6tmMBGYJj439u3jclBD96W5R2W9-31vbbvfR9Cj1Jyk1hjTRypFfFEzIiQDiXSJDdfeEp7MueZHbPJxJ_P-bSGOtUsjFIqbz5TXXjM7_LjNNrBUVnPFPtmKTUL7hFjrJjVqm4MPJpLBZrdBSXchOEPn2ZvNh0G0MTldG3QBD_IPweCKnk-GTX-9yXn6KysG_GgAPoC1VTSRI29JgMuQ7SJTg8IBlsoGAHfpVwqHHytViCeZSDBBc8_CRKx3i7SDA-W7-nmI1usttiUsHgMzeEkMJYKD4FXFySxVIxLcvNL9DZ6mT2_klJGgUQmHjNiC6082xPaM86IXTdydJ8pX8TQ9SJs39d9qhij1HUiaWk44RBA62YqL1tr7rlXqJ6kibpGmDmRxZUwvtYujOBKKoTDKafC55brijbq7J0bRiXHOEhdLMN8r2HxELAIAYvQYNFGT5X5uiDX-MuwBRhURqX7b35_fYtOnFy0AjrF7lA92-zUPTqOPrOP7eYh_22-ASRkwNg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED4hQAIGHi2I8vSAmOq2cWwSj4hSgShVpRSpW2QnNkXqA7UpEv8eXxpKBxjYouiG6L6c72zffR_AldbSJdbUUqatoVylAVWaCapdcgsVN5brnGe2HXQ6Yb8vu2tQXc7CGGPy5jNTw8f8Lj-dJHM8Kqu7Yt8tpW7B3RCcM28xrbW8M-AiFwt0-wtBpQvEH0bNeq_bjLCNi9U8VAVfyUArkip5Rmnt_e9b9mG3qBzJ7QLqA1gz4xLsfasykCJIS7CzQjFYhqiFjJd6aEj0ORqhfJYDhSyY_mk0Vu-zwSQjt8PXyfQtG4xmxBWxpI3t4TRyloY0kVkXRbFMSgp680N4ad337h5oIaRAExeRGfWUNdzjynLnjNT3E2ZvAhOqFPtelBeG9kaYIBDCZ4luWDzjUEjs5movz1rJ_SNYH0_G5hhIwJKGNMr52vo4hKuFUkwKKVQoG76vKlD9dm6cFCzjKHYxjPPdRkPGiEWMWMQOiwpcL83fF_QafxmWEYOlUeH-k99fX8LWQ--5HbcfO0-nsM1yCQvsGzuD9Ww6N-ewmXxkb7PpRf4LfQFY88Qf |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Flexible+Symmetrical+Global-Snapshot+Algorithms+for+Large-Scale+Distributed+Systems&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Jichiang+Tsai&rft.date=2013-03-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=24&rft.issue=3&rft.spage=493&rft.epage=505&rft_id=info:doi/10.1109%2FTPDS.2012.139&rft.externalDocID=6197182 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |