Interval-Asynchrony: Delimited Intervals of Localised Asynchrony for Fast Parallel SGD

Saved in:
Bibliographic Details
Title: Interval-Asynchrony: Delimited Intervals of Localised Asynchrony for Fast Parallel SGD
Authors: Garby, Jacob Stacey, 2001, Tsigas, Philippas, 1967
Source: Relaxed Semantics Across the Data Analytics Stack (RELAX-DN) 31st International Conference on Parallel and Distributed Computing, Euro-Par 2025, Dresden, Germany Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Artifact of the paper: Interval-Asynchrony: Delimited Intervals of Localised Asynchrony for Fast Parallel SGD. 15901 LNCS:236-249
Subject Terms: Parallel SGD, Asynchronous Data Processing, Staleness, Parallel Algorithms
Description: Stochastic gradient descent (SGD) is a crucial optimisation algorithm due to its ubiquity in machine learning applications. Parallelism is a popular approach to scale SGD, but the standard synchronous formulation struggles due to significant synchronisation overhead. For this reason, asynchronous implementations are increasingly common. These provide an improvement in throughput at the expense of introducing stale gradients which reduce model accuracy. Previous approaches to mitigate the downsides of asynchronous processing include adaptively adjusting the number of worker threads or the learning rate, but at their core these are still fully asynchronous and hence still suffer from lower accuracy due to more staleness. We propose Interval-Asynchrony, a semi-asynchronous method which retains high throughput while reducing gradient staleness, both on average as well as with a hard upper bound. Our method achieves this by introducing periodic asynchronous intervals, within which SGD is executed asynchronously, but between which gradient computations may not cross. The size of these intervals determines the degree of asynchrony, providing us with an adjustable scale. Since the optimal interval size varies over time, we additionally provide two strategies for dynamic adjustment thereof. We evaluate our method against several baselines on the CIFAR-10 and CIFAR-100 datasets, and demonstrate a 32% decrease in training time as well as improved scalability up to 128 threads.
Access URL: https://research.chalmers.se/publication/548121
Database: SwePub
Description
Abstract:Stochastic gradient descent (SGD) is a crucial optimisation algorithm due to its ubiquity in machine learning applications. Parallelism is a popular approach to scale SGD, but the standard synchronous formulation struggles due to significant synchronisation overhead. For this reason, asynchronous implementations are increasingly common. These provide an improvement in throughput at the expense of introducing stale gradients which reduce model accuracy. Previous approaches to mitigate the downsides of asynchronous processing include adaptively adjusting the number of worker threads or the learning rate, but at their core these are still fully asynchronous and hence still suffer from lower accuracy due to more staleness. We propose Interval-Asynchrony, a semi-asynchronous method which retains high throughput while reducing gradient staleness, both on average as well as with a hard upper bound. Our method achieves this by introducing periodic asynchronous intervals, within which SGD is executed asynchronously, but between which gradient computations may not cross. The size of these intervals determines the degree of asynchrony, providing us with an adjustable scale. Since the optimal interval size varies over time, we additionally provide two strategies for dynamic adjustment thereof. We evaluate our method against several baselines on the CIFAR-10 and CIFAR-100 datasets, and demonstrate a 32% decrease in training time as well as improved scalability up to 128 threads.
ISSN:16113349
03029743
DOI:10.1007/978-3-031-99857-7_17