Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms

As the training dataset size and the model size of machine learning increase rapidly, more computing resources are consumed to speedup the training process. However, the scalability and performance reproducibility of parallel machine learning training, which mainly uses stochastic optimization algor...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on parallel and distributed systems Ročník 32; číslo 7; s. 1702 - 1712
Hlavní autoři:	Cheng, Daning, Li, Shigang, Zhang, Hanping, Xia, Fen, Zhang, Yunquan
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.07.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms Datasets Machine learning Machine learning algorithms Optimization Parallel training algorithms Properties (attributes) Scalability stochastic optimization methods Stochastic processes Task analysis Training training dataset Upper bound Upper bounds
ISSN:	1045-9219, 1558-2183
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	As the training dataset size and the model size of machine learning increase rapidly, more computing resources are consumed to speedup the training process. However, the scalability and performance reproducibility of parallel machine learning training, which mainly uses stochastic optimization algorithms, are limited. In this paper, we demonstrate that the sample difference in the dataset plays a prominent role in the scalability of parallel machine learning algorithms. We propose to use statistical properties of dataset to measure sample differences. These properties include the variance of sample features, sample sparsity, sample diversity, and similarity in sampling sequences. We choose four types of parallel training algorithms as our research objects: (1) the asynchronous parallel SGD algorithm (Hogwild! algorithm), (2) the parallel model average SGD algorithm (minibatch SGD algorithm), (3) the decentralization optimization algorithm, and (4) the dual coordinate optimization (DADM algorithm). Our results show that the statistical properties of training datasets determine the scalability upper bound of these parallel training algorithms.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2020.3048836