APapo: An asynchronous parallel optimization method for DNN models

To address the challenges related to segmentation complexity, high memory usage, extended training duration, and low equipment utilization in parallel optimization of large-scale deep neural network (DNN) models, this paper proposes an asynchronous parallel optimization method APapo. Firstly, a mult...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Future generation computer systems Jg. 152; S. 317 - 330
Hauptverfasser: Liu, Shuai, Ju, Tao
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.03.2024
Schlagworte:
ISSN:0167-739X, 1872-7115
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To address the challenges related to segmentation complexity, high memory usage, extended training duration, and low equipment utilization in parallel optimization of large-scale deep neural network (DNN) models, this paper proposes an asynchronous parallel optimization method APapo. Firstly, a multi-iteration asynchronous pipeline parallel scheduling was established for model parallel computing tasks, controlling the specific scheduling process of micro-batch units to address gradient delay updating during asynchronous iteration. Secondly, combined with the given network model and hardware configuration, a dynamic programming strategy for computing resources and model tasks was designed to achieve dynamic segmentation of model computing tasks and optimal matching of computing resources. Finally, an optimization strategy for runtime scheduling of computing resources and model tasks was developed, using improved device streams to maximize the overlap between computing and communication, thus improving the utilization rate of computing resources and reducing training time. Experimental results show that the APapo method achieves fine-grained task segmentation, maximizes the utilization rate of each GPU computing resource, and on average improves the training speed of large-scale deep neural network models by 2.8 times while maintaining the training accuracy of the model compared to existing parallel optimization methods. •We proposed an improved optimization strategy for parallel task scheduling of pipeline model, established a multi-iteration asynchronous parallel task management mechanism suitable for large-scale model computing tasks, designed the total execution framework of computing resources and model tasks to solve the problem of model partitioning and equipment allocation, and solve the problem of gradient delay updating during asynchronous iteration by controlling the micro-batch unit scheduling process.•We proposed a model segmentation method based on augmented antichain. Through the computational task transformation of large-scale DNN model, we constructed the antichain Directed Acyclic Graph (DAG) state sequence that conforms to the computational iteration specification. On this basis, combined with the characteristics of hardware computing resources, tasks are segmented through dynamic programming to achieve a reasonable match between computing tasks and computing resources.•We designed a runtime scheduling strategy for computing resources and tasks. By optimizing the default stream of devices, the dependency between computing nodes and communication is eliminated to the maximum extent, the overlap between computing and communication is maximized, the utilization rate of computing resources is improved, and the training speed of large-scale deep neural network model is increased while the accuracy of model training is guaranteed.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2023.11.004