2D-THA-ADMM: communication efficient distributed ADMM algorithm framework based on two-dimensional torus hierarchical AllReduce

Model synchronization refers to the communication process involved in large-scale distributed machine learning tasks. As the cluster scales up, the synchronization of model parameters becomes a challenging task that has to be coordinated among thousands of workers. Firstly, this study proposes a h i...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International journal of machine learning and cybernetics Ročník 15; číslo 2; s. 207 - 226
Hlavní autoři: Wang, Guozheng, Lei, Yongmei, Zhang, Zeyu, Peng, Cunlu
Médium: Journal Article
Jazyk:angličtina
Vydáno: Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2024
Témata:
ISSN:1868-8071, 1868-808X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Model synchronization refers to the communication process involved in large-scale distributed machine learning tasks. As the cluster scales up, the synchronization of model parameters becomes a challenging task that has to be coordinated among thousands of workers. Firstly, this study proposes a h ierarchical A llReduce algorithm structured on a two - d imensional t orus (2D-THA), which utilizes a hierarchical structure to synchronize model parameters and maximize bandwidth utilization. Secondly, this study introduces a distributed consensus algorithm called 2D-THA-ADMM, which combines the 2D-THA synchronization algorithm with the alternating direction method of multipliers (ADMM). Thirdly, we evaluate the model parameter synchronization performance of 2D-THA and the scalability of 2D-THA-ADMM on the Tianhe-2 supercomputing platform using real public datasets. Our experiments demonstrate that 2D-THA significantly reduces synchronization time by 63.447 % compared to MPI_Allreduce. Furthermore, the proposed 2D-THA-ADMM algorithm exhibits excellent scalability, with a training speed increase of over 3 × compared to the state-of-the-art methods, while maintaining high accuracy and computational efficiency.
ISSN:1868-8071
1868-808X
DOI:10.1007/s13042-023-01903-9