AdapCC: Making Collective Communication in Distributed Machine Learning Adaptive

As deep learning (DL) models continue to grow in size, there is a pressing need for distributed model learning using a large number of devices (e.g., G PU s) and servers. Collective communication among devices/servers (for gradient synchronization, intermediate data exchange, etc.) introduces signif...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the International Conference on Distributed Computing Systems pp. 25 - 35
Main Authors: Zhao, Xiaoyang, Zhang, Zhe, Wu, Chuan
Format: Conference Proceeding
Language:English
Published: IEEE 23.07.2024
Subjects:
ISSN:2575-8411
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first