A Dynamic Sliding Window Based Tensor Communication Scheduling Framework for Distributed Deep Learning

Simultaneous tensor communication can effectively improve the scalability of distributed deep learning on large clusters. However, a fixed number of tensor blocks communicated concurrently violates the priority-based scheduling strategy and cannot minimize communication overheads. In this paper, we...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on network science and engineering Jg. 12; H. 2; S. 1080 - 1095
Hauptverfasser:	Gao, Yunqi, Hu, Bing, Mashhadi, Mahdi Boloursaz, Wang, Wei, Tafazolli, Rahim, Debbah, Merouane
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Piscataway IEEE 01.03.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Artificial neural networks Clusters Communication communication scheduling Computational modeling Computer architecture data parallelism Deep learning Distributed deep learning Dynamic scheduling Energy consumption generative pre-trained transformer (GPT) Greedy algorithms Mathematical models Parallel processing Priority scheduling Processor scheduling Sliding tensor partitioning Tensors Training
ISSN:	2327-4697, 2334-329X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!