Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems

Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 24; no. 3; pp. 493 - 505
Main Author: Tsai, Jichiang
Format: Journal Article
Language:English
Published: IEEE 01.03.2013
Subjects:
ISSN:1045-9219
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm is able to achieve better workload balance and less network congestion. Most importantly, based on our framework, we demonstrate that the minimum number of control messages required by a symmetrical global-snapshot algorithm is Ω(N log N), where N is the number of processes. Finally, we also assume non-FIFO channels.
AbstractList Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm is able to achieve better workload balance and less network congestion. Most importantly, based on our framework, we demonstrate that the minimum number of control messages required by a symmetrical global-snapshot algorithm is Ω(N log N), where N is the number of processes. Finally, we also assume non-FIFO channels.
Author Jichiang Tsai
Author_xml – sequence: 1
  givenname: Jichiang
  surname: Tsai
  fullname: Tsai, Jichiang
BookMark eNp1kMFLwzAUh3OY4DY9evLSfyAzr2ma5jg2twkDhc5zSbuXLZK2I4ng_ns7Jh4ET-_yfT8e34SMur5DQh6AzQCYetq9LctZyiCdAVcjMgaWCapSULdkEsIHY5AJlo1JuXL4ZWuHSXluW4zeNtola9fX2tGy06dw7GMyd4fe23hsQ2J6n2y1PyAtBxKTpQ2DVH9G3A8TIWIb7siN0S7g_c-dkvfV826xodvX9ctivqVNKmSkoA1mkGmTDQ_vOW9Sk0ss9D4XkmkoCpMLlFIInjY1M4pzpnkuGTABxqiMTwm_7ja-D8GjqRobdbR9F722rgJWXVpUlxbVpUU1tBgs-sc6edtqf_6Xf7zyFhF_2RyUhCLl32tZbY0
CODEN ITDSEO
CitedBy_id crossref_primary_10_1007_s00224_014_9599_8
crossref_primary_10_1016_j_jpdc_2013_09_009
crossref_primary_10_3390_s20226446
crossref_primary_10_1177_0037549713485499
crossref_primary_10_3390_electronics11071127
Cites_doi 10.1109/71.780865
10.1088/0967-1846/2/4/005
10.1145/357360.357365
10.1145/359545.359563
10.1109/TPDS.2010.24
10.1007/3-540-51687-5_50
10.1109/TPDS.2009.108
10.1007/BF01782776
10.1002/0471721271
10.1109/TSE.2007.1000
10.1145/214451.214456
10.1145/1006209.1006248
10.1016/0020-0190(87)90125-6
10.1109/IPDPS.2011.56
10.1145/568522.568525
10.1007/3-540-51687-5_37
10.1145/93385.93398
10.1006/jpdc.1993.1075
10.1109/32.263754
10.1109/TSE.1987.232562
10.1016/j.jpdc.2008.08.003
10.1109/71.798312
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TPDS.2012.139
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library Online
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EndPage 505
ExternalDocumentID 10_1109_TPDS_2012_139
6197182
Genre orig-research
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
TWZ
UHB
VH1
AAYXX
CITATION
ID FETCH-LOGICAL-c257t-1afe414af4109d33c2f67e8ad6570a188f65e775532cb0f9330a36701051ff943
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000313816300007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1045-9219
IngestDate Sat Nov 29 08:08:58 EST 2025
Tue Nov 18 22:26:32 EST 2025
Wed Aug 27 02:52:20 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c257t-1afe414af4109d33c2f67e8ad6570a188f65e775532cb0f9330a36701051ff943
PageCount 13
ParticipantIDs crossref_citationtrail_10_1109_TPDS_2012_139
crossref_primary_10_1109_TPDS_2012_139
ieee_primary_6197182
PublicationCentury 2000
PublicationDate 2013-03-01
PublicationDateYYYYMMDD 2013-03-01
PublicationDate_xml – month: 03
  year: 2013
  text: 2013-03-01
  day: 01
PublicationDecade 2010
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2013
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref12
ref23
ref15
ref14
ref20
Janssens (ref18)
ref11
ref22
Grama (ref24) 2003
ref10
ref21
ref2
ref1
ref17
ref16
ref19
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref19
  doi: 10.1109/71.780865
– ident: ref14
  doi: 10.1088/0967-1846/2/4/005
– ident: ref4
  doi: 10.1145/357360.357365
– ident: ref17
  doi: 10.1145/359545.359563
– volume-title: Introduction to Parallel Computing
  year: 2003
  ident: ref24
– ident: ref16
  doi: 10.1109/TPDS.2010.24
– ident: ref9
  doi: 10.1007/3-540-51687-5_50
– ident: ref15
  doi: 10.1109/TPDS.2009.108
– ident: ref3
  doi: 10.1007/BF01782776
– start-page: 505
  volume-title: Proc. Int’l Conf. Parallel Processing
  ident: ref18
  article-title: Experimental Evaluation of Multiprocessor Cache-Based Error Recovery
– ident: ref23
  doi: 10.1002/0471721271
– ident: ref6
  doi: 10.1109/TSE.2007.1000
– ident: ref1
  doi: 10.1145/214451.214456
– ident: ref7
  doi: 10.1145/1006209.1006248
– ident: ref12
  doi: 10.1016/0020-0190(87)90125-6
– ident: ref22
  doi: 10.1109/IPDPS.2011.56
– ident: ref8
  doi: 10.1145/568522.568525
– ident: ref11
  doi: 10.1007/3-540-51687-5_37
– ident: ref10
  doi: 10.1145/93385.93398
– ident: ref13
  doi: 10.1006/jpdc.1993.1075
– ident: ref5
  doi: 10.1109/32.263754
– ident: ref2
  doi: 10.1109/TSE.1987.232562
– ident: ref21
  doi: 10.1016/j.jpdc.2008.08.003
– ident: ref20
  doi: 10.1109/71.798312
SSID ssj0014504
Score 2.0611217
Snippet Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes....
SourceID crossref
ieee
SourceType Enrichment Source
Index Database
Publisher
StartPage 493
SubjectTerms Algorithm design and analysis
checkpointing
Complexity theory
Distributed systems
global snapshots
Hypercubes
message passing
Process control
process symmetry
Program processors
Time factors
Vectors
Title Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems
URI https://ieeexplore.ieee.org/document/6197182
Volume 24
WOSCitedRecordID wos000313816300007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore Digital Library
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://ieeexplore.ieee.org/
  omitProxy: false
  ssIdentifier: ssj0014504
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG6QeNCDKGjEX-nBeKKwX6XrkYjEAyEkw4Tb0m6tmMBGYJj439u3jclBD96W5R2W9-31vbbvfR9Cj1Jyk1hjTRypFfFEzIiQDiXSJDdfeEp7MueZHbPJxJ_P-bSGOtUsjFIqbz5TXXjM7_LjNNrBUVnPFPtmKTUL7hFjrJjVqm4MPJpLBZrdBSXchOEPn2ZvNh0G0MTldG3QBD_IPweCKnk-GTX-9yXn6KysG_GgAPoC1VTSRI29JgMuQ7SJTg8IBlsoGAHfpVwqHHytViCeZSDBBc8_CRKx3i7SDA-W7-nmI1usttiUsHgMzeEkMJYKD4FXFySxVIxLcvNL9DZ6mT2_klJGgUQmHjNiC6082xPaM86IXTdydJ8pX8TQ9SJs39d9qhij1HUiaWk44RBA62YqL1tr7rlXqJ6kibpGmDmRxZUwvtYujOBKKoTDKafC55brijbq7J0bRiXHOEhdLMN8r2HxELAIAYvQYNFGT5X5uiDX-MuwBRhURqX7b35_fYtOnFy0AjrF7lA92-zUPTqOPrOP7eYh_22-ASRkwNg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED4hQAIGHi2I8vSAmOq2cWwSj4hSgShVpRSpW2QnNkXqA7UpEv8eXxpKBxjYouiG6L6c72zffR_AldbSJdbUUqatoVylAVWaCapdcgsVN5brnGe2HXQ6Yb8vu2tQXc7CGGPy5jNTw8f8Lj-dJHM8Kqu7Yt8tpW7B3RCcM28xrbW8M-AiFwt0-wtBpQvEH0bNeq_bjLCNi9U8VAVfyUArkip5Rmnt_e9b9mG3qBzJ7QLqA1gz4xLsfasykCJIS7CzQjFYhqiFjJd6aEj0ORqhfJYDhSyY_mk0Vu-zwSQjt8PXyfQtG4xmxBWxpI3t4TRyloY0kVkXRbFMSgp680N4ad337h5oIaRAExeRGfWUNdzjynLnjNT3E2ZvAhOqFPtelBeG9kaYIBDCZ4luWDzjUEjs5movz1rJ_SNYH0_G5hhIwJKGNMr52vo4hKuFUkwKKVQoG76vKlD9dm6cFCzjKHYxjPPdRkPGiEWMWMQOiwpcL83fF_QafxmWEYOlUeH-k99fX8LWQ--5HbcfO0-nsM1yCQvsGzuD9Ww6N-ewmXxkb7PpRf4LfQFY88Qf
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Flexible+Symmetrical+Global-Snapshot+Algorithms+for+Large-Scale+Distributed+Systems&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Jichiang+Tsai&rft.date=2013-03-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=24&rft.issue=3&rft.spage=493&rft.epage=505&rft_id=info:doi/10.1109%2FTPDS.2012.139&rft.externalDocID=6197182
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon