AdaGL: Adaptive Learning for Agile Distributed Training of Gigantic GNNs

Distributed GNN training on contemporary massive and densely connected graphs requires information aggregation from all neighboring nodes, which leads to an explosion of inter-server communications. This paper proposes AdaGL, a highly scalable end-to-end framework for rapid distributed GNN training....

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2023 60th ACM/IEEE Design Automation Conference (DAC) s. 1 - 6
Hlavní autoři: Zhang, Ruisi, Javaheripi, Mojan, Ghodsi, Zahra, Bleiweiss, Amit, Koushanfar, Farinaz
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 09.07.2023
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Distributed GNN training on contemporary massive and densely connected graphs requires information aggregation from all neighboring nodes, which leads to an explosion of inter-server communications. This paper proposes AdaGL, a highly scalable end-to-end framework for rapid distributed GNN training. AdaGL novelty lies upon our adaptive-learning based graph-allocation engine as well as utilizing multi-resolution coarse representation of dense graphs. As a result, AdaGL achieves an unprecedented level of balanced server computation while minimizing the communication overhead. Extensive proof-of-concept evaluations on billion-scale graphs show AdaGL attains ∼30−40% faster convergence compared with prior arts.
DOI:10.1109/DAC56929.2023.10248003