Fair Selection of Edge Nodes to Participate in Clustered Federated Multitask Learning

Clustered federated Multitask learning is introduced as an efficient technique when data is unbalanced and distributed amongst clients in a non-independent and identically distributed manner. While a similarity metric can provide client groups with specialized models according to their data distribu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE eTransactions on network and service management Jg. 20; H. 2; S. 1502 - 1516
Hauptverfasser:	Albaseer, Abdullatif Mohammed, Abdallah, Mohamed, Al-Fuqaha, Ala, Seid, Abegaz Mohammed, Erbad, Aiman, Dobre, Octavia A.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 01.06.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Bandwidth CFL Clients Clustering Convergence Data models Distributed learning Heterogeneity incongruent data distribution Learning non-i.i.d participants scheduling Performance evaluation resource allocation Scheduling Servers Telematics Training Wireless communication
ISSN:	1932-4537, 1932-4537
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Clustered federated Multitask learning is introduced as an efficient technique when data is unbalanced and distributed amongst clients in a non-independent and identically distributed manner. While a similarity metric can provide client groups with specialized models according to their data distribution, this process can be time-consuming because the server needs to capture all data distribution first from all clients to perform the correct clustering. Due to resource and time constraints at the network edge, only a fraction of devices is selected every round, necessitating the need for an efficient scheduling technique to address these issues. Thus, this paper introduces a two-phased client selection and scheduling approach to improve the convergence speed while capturing all data distributions. This approach ensures correct clustering and fairness between clients by leveraging bandwidth reuse for participants spent a longer time training their models and exploiting the heterogeneity in the devices to schedule the participants according to their delay. The server then performs the clustering depending on predetermined thresholds and stopping criteria. When a specified cluster approximates a stopping point, the server employs a greedy selection for that cluster by picking the devices with lower delay and better resources. The convergence analysis is provided, showing the relationship between the proposed scheduling approach and the convergence rate of the specialized models to obtain convergence bounds under non-i.i.d. data distribution. We carry out extensive simulations, and the results demonstrate that the proposed algorithms reduce training time and improve the convergence speed by up to 50% while equipping every user with a customized model tailored to its data distribution.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1932-4537 1932-4537
DOI:	10.1109/TNSM.2023.3270168