FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data

Federated Learning (FL) is popular for communication-efficient learning from distributed data. To utilize data at different clients without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a computation then aggregation model, in which multiple local updates...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on signal processing Vol. 69; pp. 6055 - 6070
Main Authors:	Zhang, Xinwei, Hong, Mingyi, Dhople, Sairaj, Yin, Wotao, Liu, Yang
Format:	Journal Article
Language:	English
Published:	New York IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Agglomeration Algorithms Communication Complexity theory Computational modeling convergence analysis data heterogeneity Data models Design optimization Distributed algorithms Distributed databases Federated learning Heterogeneity machine learning algorithms Servers Signal processing algorithms
ISSN:	1053-587X, 1941-0476
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Federated Learning (FL) is popular for communication-efficient learning from distributed data. To utilize data at different clients without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a computation then aggregation model, in which multiple local updates are performed using local data before aggregation. These algorithms fail to work when faced with practical challenges, e.g., the local data being non-identically independently distributed. In this paper, we first characterize the behavior of the FedAvg algorithm, and show that without strong and unrealistic assumptions on the problem structure, it can behave erratically. Aiming at designing FL algorithms that are provably fast and require as few assumptions as possible, we propose a new algorithm design strategy from the primal-dual optimization perspective. Our strategy yields algorithms that can deal with non-convex objective functions, achieves the best possible optimization and communication complexity (in a well-defined sense), and accommodates full-batch and mini-batch local computation models. Importantly, the proposed algorithms are communication efficient , in that the communication effort can be reduced when the level of heterogeneity among the local data also reduces. In the extreme case where the local data becomes homogeneous, only <inline-formula><tex-math notation="LaTeX">\mathcal {O}(1)</tex-math></inline-formula> communication is required among the agents. To the best of our knowledge, this is the first algorithmic framework for FL that achieves all the above properties.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2021.3115952