Multi-failure fault-tolerance of embedded loops on hypercubes: issues and performance study

The authors study the multi-failure fault-tolerance of hypercubes. Reconfiguration algorithms are proposed to reallocate the function of failed nodes to spare nodes so the communication structure of the interrupted parallel algorithms is preserved. Both clustered fault and concurrent fault are consi...

Full description

Saved in:
Bibliographic Details
Published in:Parallel and Distributed Processing, 2nd IEEE Symposium On pp. 511 - 518
Main Authors: Liang, C.T., Tsai, W.T.
Format: Conference Proceeding
Language:English
Published: IEEE Comput. Soc. Press 1990
Subjects:
ISBN:0818620870, 9780818620874
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The authors study the multi-failure fault-tolerance of hypercubes. Reconfiguration algorithms are proposed to reallocate the function of failed nodes to spare nodes so the communication structure of the interrupted parallel algorithms is preserved. Both clustered fault and concurrent fault are considered. Loops are selected as the embedded communication structures, where a wide variety of applications have been implemented. In early work, two classes of fault-tolerant embedded loops, Mapping II and III, have been designed and proved one-step reconfigurable for any single failure. The authors derive from shortest path algorithms a distributed reconfiguration algorithm for multiple failures on these embedded loops. Proof of reconfigurability for clustered fault is conducted for Mapping III. Performance of both mappings is evaluated by simulation with parameters such as the average number of tolerable failures, the average number of job migrations, and the utilization rate of nodes.< >
AbstractList The authors study the multi-failure fault-tolerance of hypercubes. Reconfiguration algorithms are proposed to reallocate the function of failed nodes to spare nodes so the communication structure of the interrupted parallel algorithms is preserved. Both clustered fault and concurrent fault are considered. Loops are selected as the embedded communication structures, where a wide variety of applications have been implemented. In early work, two classes of fault-tolerant embedded loops, Mapping II and III, have been designed and proved one-step reconfigurable for any single failure. The authors derive from shortest path algorithms a distributed reconfiguration algorithm for multiple failures on these embedded loops. Proof of reconfigurability for clustered fault is conducted for Mapping III. Performance of both mappings is evaluated by simulation with parameters such as the average number of tolerable failures, the average number of job migrations, and the utilization rate of nodes.< >
Author Liang, C.T.
Tsai, W.T.
Author_xml – sequence: 1
  givenname: C.T.
  surname: Liang
  fullname: Liang, C.T.
  organization: Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN, USA
– sequence: 2
  givenname: W.T.
  surname: Tsai
  fullname: Tsai, W.T.
  organization: Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN, USA
BookMark eNotT81KxDAYDKigu-5dPOUFuiZt0k28yfoLKy64Nw_Ll-QLVtqmJO2hb29wncswAzPDLMh5H3ok5IazNedM333uH_drrnWWopJanJEFU1zVJVMbdklWKf2wDCmzJa7I1_vUjk3hoWmniNRDlsUYWozQW6TBU-wMOoeOtiEMiYaefs8DRjsZTPe0SWnCRKF3NJs-xO4vl8bJzdfkwkObcPXPS3J4fjpsX4vdx8vb9mFXNEqPRcksgLBcKW-MKK1VRjhmaglCliiEq_N8VVtR8Y2SHAC0495p4ByZYVW1JLen2gYRj0NsOojz8fS--gUeUlP_
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SPDP.1990.143594
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 518
ExternalDocumentID 143594
GroupedDBID 6IE
6IK
6IL
AAJGR
AAWTH
ACGHX
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
OCL
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i89t-20caa4c188fbb42cc8b4d0b65a452e44d6edd36c4317851aaa9d1fd9a11e0b033
IEDL.DBID RIE
ISBN 0818620870
9780818620874
IngestDate Tue Aug 26 17:03:07 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i89t-20caa4c188fbb42cc8b4d0b65a452e44d6edd36c4317851aaa9d1fd9a11e0b033
PageCount 8
ParticipantIDs ieee_primary_143594
PublicationCentury 1900
PublicationDate 19900000
PublicationDateYYYYMMDD 1990-01-01
PublicationDate_xml – year: 1990
  text: 19900000
PublicationDecade 1990
PublicationTitle Parallel and Distributed Processing, 2nd IEEE Symposium On
PublicationTitleAbbrev SPDP
PublicationYear 1990
Publisher IEEE Comput. Soc. Press
Publisher_xml – name: IEEE Comput. Soc. Press
SSID ssj0000558624
Score 1.2081757
Snippet The authors study the multi-failure fault-tolerance of hypercubes. Reconfiguration algorithms are proposed to reallocate the function of failed nodes to spare...
SourceID ieee
SourceType Publisher
StartPage 511
SubjectTerms Clustering algorithms
Computational modeling
Computer networks
Computer science
Fault tolerance
Hardware
Hypercubes
Large-scale systems
Network topology
Parallel algorithms
Title Multi-failure fault-tolerance of embedded loops on hypercubes: issues and performance study
URI https://ieeexplore.ieee.org/document/143594
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcBUKEV8ywOrqZPaic0KVExVJDpUYqj8FVGpJFWT8Ps5O2krJBY2XwYn8sW-89nvPYQeGIdlT1FKYq00gRCgiRSGE22phXAqIMM2QWwinc3EYiGzjmc7YGGcc-HymXv0zXCWb0vT-FLZ2Md2yXqol6ZpC9Xal1Mo5x7qEBgeI2hR-A87fp2dzXanlFSO37OXzAP1wAx9_tJWCaFlOvjXR52i0QGih7N98DlDR64YosFOowF3U_YcfQSELcnVyt8_x7kCk9Tl2nlBDYfLHLsv7WD1sXhdlpsKlwX-hL3p1jTaVU84-KXCqrB4c8AY4MBKO0Lz6ev8-Y10ggpkJWQNE8IoxUwkRK41i40RmlmqE64Yjx1jNoHXTRLjcwpIxJRS0ka5lSqKHNV0MrlA_aIs3CXCMfdMcCqNEyuZtAKyGOgF9lbScsjB7BUa-pFablrKjGU7SNd_Pr1BJ94TbV3jFvXrbePu0LH5rlfV9j64-QemC6Ws
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4UTfSEIsbf9uC10o12rF5VghHJEjmQeCD9tUiCG2HDv9_XbkBMvHjr26Fb-ta-19d-34fQHeOw7ElKSaikIhACFBGx5kQZaiCcxpBhay820RuN4slEJDXPtsfCWGv95TN775r-LN_keuVKZR0X2wXbRXucsTCowFqbggrl3IEdPMdjAC0Kf2LNsLO22fqckorOe_KUOKgemL7XX-oqPrj0m__6rCPU3oL0cLIJP8dox2Yt1FyrNOB60p6gD4-xJamcuRvoOJVgkjKfWyepYXGeYvulLKw_Bs_zfFHgPMOfsDtd6pWyxQP2nimwzAxebFEG2PPSttG4_zx-HJBaUoHMYlHClNBSMh3EcaoUC7WOFTNURVwyHlrGTASv60baZRWQikkphQlSI2QQWKpot3uKGlme2TOEQ-644GQvjIxgwsSQx0AvsLsShkMWZs5Ry43UdFGRZkyrQbr48-ktOhiM34bT4cvo9RIdOq9UVY4r1CiXK3uN9vV3OSuWN97lP3QjqPM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Parallel+and+Distributed+Processing%2C+2nd+IEEE+Symposium+On&rft.atitle=Multi-failure+fault-tolerance+of+embedded+loops+on+hypercubes%3A+issues+and+performance+study&rft.au=Liang%2C+C.T.&rft.au=Tsai%2C+W.T.&rft.date=1990-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818620874&rft.spage=511&rft.epage=518&rft_id=info:doi/10.1109%2FSPDP.1990.143594&rft.externalDocID=143594
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818620874/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818620874/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780818620874/sc.gif&client=summon&freeimage=true