Optimizing Distributed ML Communication with Fused Computation-Collective Operations

Machine learning models are distributed across multiple nodes using numerous parallelism strategies. The resulting collective communication is often on the critical path due to a lack of independent coarse-grain computation kernels available to execute. In this work, we propose fusing computation wi...

Full description

Saved in:
Bibliographic Details
Published in:SC24: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 17
Main Authors: Punniyamurthy, Kishore, Hamidouche, Khaled, Beckmann, Bradford M.
Format: Conference Proceeding
Language:English
Published: IEEE 17.11.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first