PRT: An Efficient Pipeline Reuse Technology for Large Models Training
The rapid evolution of large models and the widespread application of extensive datasets have made the cost of training increasingly prohibitive. While pipeline model parallelism makes it possible to train large models, existing pipeline techniques find it difficult to reduce bubble time due to thei...
Saved in:
| Published in: | Proceedings / IEEE International Conference on Cluster Computing pp. 1 - 11 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
02.09.2025
|
| Subjects: | |
| ISSN: | 2168-9253 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The rapid evolution of large models and the widespread application of extensive datasets have made the cost of training increasingly prohibitive. While pipeline model parallelism makes it possible to train large models, existing pipeline techniques find it difficult to reduce bubble time due to their strong dependence on the number of GPUs for pipeline depth. This paper introduces a novel pipeline reuse technology, PRT, which breaks the limitation of pipeline depth being dependent on the number of GPUs, allowing for deeper pipelines even when the number of GPUs is limited. This paper also theoretically demonstrates the feasibility of PRT. Furthermore, the high orthogonality of PRT allows it to be implemented in both unidirectional and bidirectional pipelines, further enhancing pipeline efficiency. It is evaluated on a server equipped with 8 GPUs, using the BERT series models and ResNet series models with datasets including the IMDB dataset and the mini-ImageNet dataset. Experimental results show that for the BERT series models, unidirectional and bidirectional pipelines with PRT achieve throughput improvements of up to 54.78% and 30.38%, respectively. For the ResNet series models, the improvements reached up to 76.59% and 26.45%, respectively. Additionally, PRT achieves more balanced memory usage, validating its efficiency. |
|---|---|
| ISSN: | 2168-9253 |
| DOI: | 10.1109/CLUSTER59342.2025.11186481 |