APapo: An asynchronous parallel optimization method for DNN models
To address the challenges related to segmentation complexity, high memory usage, extended training duration, and low equipment utilization in parallel optimization of large-scale deep neural network (DNN) models, this paper proposes an asynchronous parallel optimization method APapo. Firstly, a mult...
Saved in:
| Published in: | Future generation computer systems Vol. 152; pp. 317 - 330 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.03.2024
|
| Subjects: | |
| ISSN: | 0167-739X, 1872-7115 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | To address the challenges related to segmentation complexity, high memory usage, extended training duration, and low equipment utilization in parallel optimization of large-scale deep neural network (DNN) models, this paper proposes an asynchronous parallel optimization method APapo. Firstly, a multi-iteration asynchronous pipeline parallel scheduling was established for model parallel computing tasks, controlling the specific scheduling process of micro-batch units to address gradient delay updating during asynchronous iteration. Secondly, combined with the given network model and hardware configuration, a dynamic programming strategy for computing resources and model tasks was designed to achieve dynamic segmentation of model computing tasks and optimal matching of computing resources. Finally, an optimization strategy for runtime scheduling of computing resources and model tasks was developed, using improved device streams to maximize the overlap between computing and communication, thus improving the utilization rate of computing resources and reducing training time. Experimental results show that the APapo method achieves fine-grained task segmentation, maximizes the utilization rate of each GPU computing resource, and on average improves the training speed of large-scale deep neural network models by 2.8 times while maintaining the training accuracy of the model compared to existing parallel optimization methods.
•We proposed an improved optimization strategy for parallel task scheduling of pipeline model, established a multi-iteration asynchronous parallel task management mechanism suitable for large-scale model computing tasks, designed the total execution framework of computing resources and model tasks to solve the problem of model partitioning and equipment allocation, and solve the problem of gradient delay updating during asynchronous iteration by controlling the micro-batch unit scheduling process.•We proposed a model segmentation method based on augmented antichain. Through the computational task transformation of large-scale DNN model, we constructed the antichain Directed Acyclic Graph (DAG) state sequence that conforms to the computational iteration specification. On this basis, combined with the characteristics of hardware computing resources, tasks are segmented through dynamic programming to achieve a reasonable match between computing tasks and computing resources.•We designed a runtime scheduling strategy for computing resources and tasks. By optimizing the default stream of devices, the dependency between computing nodes and communication is eliminated to the maximum extent, the overlap between computing and communication is maximized, the utilization rate of computing resources is improved, and the training speed of large-scale deep neural network model is increased while the accuracy of model training is guaranteed. |
|---|---|
| ISSN: | 0167-739X 1872-7115 |
| DOI: | 10.1016/j.future.2023.11.004 |