HADFL: Heterogeneity-aware Decentralized Federated Learning Framework

Federated learning (FL) supports training models on geographically distributed devices. However, traditional FL systems adopt a centralized synchronous strategy, putting high communication pressure and model generalization challenge. Existing optimizations on FL either fail to speedup training on he...

Full description

Saved in:

Bibliographic Details
Published in:	2021 58th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors:	Cao, Jing, Lian, Zirui, Liu, Weihong, Zhu, Zongwei, Ji, Cheng
Format:	Conference Proceeding
Language:	English
Published:	IEEE 05.12.2021
Subjects:	Collaborative work Computational modeling Data models Design automation Distributed Training Federated Learning Heterogeneous Computing Heterogeneous networks Machine Learning Performance evaluation Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Federated learning (FL) supports training models on geographically distributed devices. However, traditional FL systems adopt a centralized synchronous strategy, putting high communication pressure and model generalization challenge. Existing optimizations on FL either fail to speedup training on heterogeneous devices or suffer from poor communication efficiency. In this paper, we propose HADFL, a framework that supports decentralized asynchronous training on heterogeneous devices. The devices train model locally with heterogeneity-aware local steps using local data. In each aggregation cycle, they are selected based on probability to perform model synchronization and aggregation. Compared with the traditional FL system, HADFL can relieve the central server's communication pressure, efficiently utilize heterogeneous computing power, and can achieve a maximum speedup of 3.15x than decentralized-FedAvg and 4.68x than Pytorch distributed training scheme, respectively, with almost no loss of convergence accuracy.
DOI:	10.1109/DAC18074.2021.9586101