Efficient 3-D Processor Array Reconfiguration Algorithms Based on Bucket Effect

With the progressive augmentation of the density of 3-D processor arrays, some processor elements (PEs) often fail due to overload or overheating during massively parallel computing operations. Therefore, it is necessary to take effective fault-tolerant technology to ensure the reliability of the sy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems Jg. 43; H. 4; S. 1023 - 1036
Hauptverfasser: Ding, Hao, He, Yanlong, Zhai, Zhongyi, Li, Zhi, Qian, Junyan, Zhao, Lingzhong
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.04.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:0278-0070, 1937-4151
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the progressive augmentation of the density of 3-D processor arrays, some processor elements (PEs) often fail due to overload or overheating during massively parallel computing operations. Therefore, it is necessary to take effective fault-tolerant technology to ensure the reliability of the system. This article investigates an efficient reconfiguration method to construct 3-D fault-free logical subarray with more fault-free PEs and less interconnection length (interlength). First, we propose a novel method based on the barrel effect to find the bottleneck plane of 3-D processor arrays. Second, an efficient compensation strategy is proposed to replace faulty PEs on adjacent physical planes with fault-free PEs on the bottleneck planes, which leads to more fault-free PEs that can be used to construct the subarray. Then, we propose a heuristic to construct the subarray and optimize iteration redundancy to accelerate reconstruction. Finally, a heuristic optimization algorithm is proposed to reduce the interlength between PEs, which can reduce the dynamic power consumption and communication costs. In addition, we propose a more accurate method to calculate the lower bound of the interlength to better evaluate the performance of the algorithm. Simulation experiments show that, compared to the state-of-the-arts, on <inline-formula> <tex-math notation="LaTeX">128\times 128\times 128 </tex-math></inline-formula> host array, the utilization rate of fault-free PEs can be improved up to 15.6% and the interlength redundancy can be reduced by 78.2% for random faults. On <inline-formula> <tex-math notation="LaTeX">64\times 64\times 64 </tex-math></inline-formula> host array, the average improvement of the two indicators under clustered faults can reach 93.2% and 69.3%. Moreover, for all cases considered, the proposed new lower bound and reconstruction time can be reduced by an average of 18.47% and 76.13%, respectively.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2023.3337196