FedSA: A Semi-Asynchronous Federated Learning Mechanism in Heterogeneous Edge Computing

Federated learning (FL) involves training machine learning models over distributed edge nodes ( i.e. , workers) while facing three critical challenges, edge heterogeneity, Non-IID data and communication resource constraint. In the synchronous FL, the parameter server has to wait for the slowest work...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE journal on selected areas in communications Ročník 39; číslo 12; s. 3654 - 3672
Hlavní autoři: Ma, Qianpiao, Xu, Yang, Xu, Hongli, Jiang, Zhida, Huang, Liusheng, Huang, He
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.12.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0733-8716, 1558-0008
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Federated learning (FL) involves training machine learning models over distributed edge nodes ( i.e. , workers) while facing three critical challenges, edge heterogeneity, Non-IID data and communication resource constraint. In the synchronous FL, the parameter server has to wait for the slowest workers, leading to significant waiting time due to edge heterogeneity. Though asynchronous FL can well tackle the edge heterogeneity, it requires frequent model transfers, resulting in massive communication resource consumption. Moreover, the different relative frequency of workers participating in asynchronous updating may seriously hurt training accuracy, especially on Non-IID data. In this paper, we propose a semi-asynchronous federated learning mechanism (FedSA), where the parameter server aggregates a certain number of local models by their arrival order in each round. We theoretically analyze the quantitative relationship between the convergence bound of FedSA and different factors, e.g. , the number of participating workers in each round, the degree of data Non-IID and edge heterogeneity. Based on the convergence bound, we present an efficient algorithm to determine the number of participating workers to minimize the training completion time. To further improve the training accuracy on Non-IID data, FedSA deploys adaptive learning rates for workers by their relative participation frequency. We extend our proposed mechanism to the dynamic and multiple learning tasks scenarios. Experimental results on the testbed show that our proposed mechanism and algorithms address the three challenges more effectively than the state-of-the-art solutions.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0733-8716
1558-0008
DOI:10.1109/JSAC.2021.3118435