PRT: An Efficient Pipeline Reuse Technology for Large Models Training

The rapid evolution of large models and the widespread application of extensive datasets have made the cost of training increasingly prohibitive. While pipeline model parallelism makes it possible to train large models, existing pipeline techniques find it difficult to reduce bubble time due to thei...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings / IEEE International Conference on Cluster Computing pp. 1 - 11
Main Authors:	Ji, Zeyu, Zhai, Banghao, Zhang, Zhonghao, Chu, Qi, Liu, Bin
Format:	Conference Proceeding
Language:	English
Published:	IEEE 02.09.2025
Subjects:	Bidirectional control Computational modeling Distributed training Encoding Memory management Parallel processing Pipeline Model Parallelism Pipeline Reuse Technology Pipelines Resource management Servers Throughput Training
ISSN:	2168-9253
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The rapid evolution of large models and the widespread application of extensive datasets have made the cost of training increasingly prohibitive. While pipeline model parallelism makes it possible to train large models, existing pipeline techniques find it difficult to reduce bubble time due to their strong dependence on the number of GPUs for pipeline depth. This paper introduces a novel pipeline reuse technology, PRT, which breaks the limitation of pipeline depth being dependent on the number of GPUs, allowing for deeper pipelines even when the number of GPUs is limited. This paper also theoretically demonstrates the feasibility of PRT. Furthermore, the high orthogonality of PRT allows it to be implemented in both unidirectional and bidirectional pipelines, further enhancing pipeline efficiency. It is evaluated on a server equipped with 8 GPUs, using the BERT series models and ResNet series models with datasets including the IMDB dataset and the mini-ImageNet dataset. Experimental results show that for the BERT series models, unidirectional and bidirectional pipelines with PRT achieve throughput improvements of up to 54.78% and 30.38%, respectively. For the ResNet series models, the improvements reached up to 76.59% and 26.45%, respectively. Additionally, PRT achieves more balanced memory usage, validating its efficiency.
ISSN:	2168-9253
DOI:	10.1109/CLUSTER59342.2025.11186481