2D-THA-ADMM: communication efficient distributed ADMM algorithm framework based on two-dimensional torus hierarchical AllReduce

Model synchronization refers to the communication process involved in large-scale distributed machine learning tasks. As the cluster scales up, the synchronization of model parameters becomes a challenging task that has to be coordinated among thousands of workers. Firstly, this study proposes a h i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of machine learning and cybernetics Jg. 15; H. 2; S. 207 - 226
Hauptverfasser: Wang, Guozheng, Lei, Yongmei, Zhang, Zeyu, Peng, Cunlu
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2024
Schlagworte:
ISSN:1868-8071, 1868-808X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Model synchronization refers to the communication process involved in large-scale distributed machine learning tasks. As the cluster scales up, the synchronization of model parameters becomes a challenging task that has to be coordinated among thousands of workers. Firstly, this study proposes a h ierarchical A llReduce algorithm structured on a two - d imensional t orus (2D-THA), which utilizes a hierarchical structure to synchronize model parameters and maximize bandwidth utilization. Secondly, this study introduces a distributed consensus algorithm called 2D-THA-ADMM, which combines the 2D-THA synchronization algorithm with the alternating direction method of multipliers (ADMM). Thirdly, we evaluate the model parameter synchronization performance of 2D-THA and the scalability of 2D-THA-ADMM on the Tianhe-2 supercomputing platform using real public datasets. Our experiments demonstrate that 2D-THA significantly reduces synchronization time by 63.447 % compared to MPI_Allreduce. Furthermore, the proposed 2D-THA-ADMM algorithm exhibits excellent scalability, with a training speed increase of over 3 × compared to the state-of-the-art methods, while maintaining high accuracy and computational efficiency.
ISSN:1868-8071
1868-808X
DOI:10.1007/s13042-023-01903-9