AdapCC: Making Collective Communication in Distributed Machine Learning Adaptive
As deep learning (DL) models continue to grow in size, there is a pressing need for distributed model learning using a large number of devices (e.g., G PU s) and servers. Collective communication among devices/servers (for gradient synchronization, intermediate data exchange, etc.) introduces signif...
Saved in:
| Published in: | Proceedings of the International Conference on Distributed Computing Systems pp. 25 - 35 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
23.07.2024
|
| Subjects: | |
| ISSN: | 2575-8411 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!