Multi-failure fault-tolerance of embedded loops on hypercubes: issues and performance study

The authors study the multi-failure fault-tolerance of hypercubes. Reconfiguration algorithms are proposed to reallocate the function of failed nodes to spare nodes so the communication structure of the interrupted parallel algorithms is preserved. Both clustered fault and concurrent fault are consi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel and Distributed Processing, 2nd IEEE Symposium On S. 511 - 518
Hauptverfasser: Liang, C.T., Tsai, W.T.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE Comput. Soc. Press 1990
Schlagworte:
ISBN:0818620870, 9780818620874
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The authors study the multi-failure fault-tolerance of hypercubes. Reconfiguration algorithms are proposed to reallocate the function of failed nodes to spare nodes so the communication structure of the interrupted parallel algorithms is preserved. Both clustered fault and concurrent fault are considered. Loops are selected as the embedded communication structures, where a wide variety of applications have been implemented. In early work, two classes of fault-tolerant embedded loops, Mapping II and III, have been designed and proved one-step reconfigurable for any single failure. The authors derive from shortest path algorithms a distributed reconfiguration algorithm for multiple failures on these embedded loops. Proof of reconfigurability for clustered fault is conducted for Mapping III. Performance of both mappings is evaluated by simulation with parameters such as the average number of tolerable failures, the average number of job migrations, and the utilization rate of nodes.< >
ISBN:0818620870
9780818620874
DOI:10.1109/SPDP.1990.143594