DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding

Scalable graph embedding on large networks is challenging because of the complexity of graph structures and limited computing resources. Recent research shows that the multi-level framework can enhance the scalability of graph embedding methods with little loss of quality. In general, methods using...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings - International Conference on High Performance Computing pp. 282 - 291
Main Authors:	He, Yuntian, Gurukar, Saket, Kousha, Pouya, Subramoni, Hari, Panda, Dhabaleswar K., Parthasarathy, Srinivasan
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.12.2021
Subjects:	Codes Conferences Distributed databases Distributed Machine Learning Graph Embedding High performance computing Machine learning Multi-Level Framework Scalability Training
ISSN:	2640-0316
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Scalable graph embedding on large networks is challenging because of the complexity of graph structures and limited computing resources. Recent research shows that the multi-level framework can enhance the scalability of graph embedding methods with little loss of quality. In general, methods using this framework first coarsen the original graph into a series of smaller graphs then learn the representations of the original graph from them in an efficient manner. However, to the best of our knowledge, most multi-level based methods do not have a parallel implementation. Meanwhile, the emergence of high-performance computing for machine learning provides an opportunity to boost graph embedding by distributed computing. In this paper, we propose a Distributed MultI -Level Embedding (DistMILE 1 1 Our code is available at https://github.com/heyuntian/DistMILE) framework to further improve the scalability of graph embedding. DistMILE leverages a novel shared-memory parallel algorithm for graph coarsening and a distributed training paradigm for embedding refinement. With the advantage of high-performance computing techniques, Dist-MILE can smoothly scale different base embedding methods over large networks. Our experiments demonstrate that DistMILE learns representations of similar quality with respect to other baselines, while reduces the time of learning embeddings on large-scale networks to hours. Results show that DistMILE can achieve up to 28 x speedup compared with a popular multi-level embedding framework MILE and expedite existing embedding methods with 40 x speedup.
ISSN:	2640-0316
DOI:	10.1109/HiPC53243.2021.00042