Memory-efficient tensor parallelism for long-sequence Transformer training

Transformer-based models like large language models (LLMs) have attracted significant attention in recent years due to their superior performance. A long sequence of input tokens is essential for industrial LLMs to provide better user services. However, memory consumption increases quadratically wit...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Frontiers of information technology & electronic engineering Ročník 26; číslo 5; s. 770 - 787
Hlavní autori:	Liang, Peng, Qiao, Linbo, Shi, Yanqi, Zheng, Hao, Tang, Yu, Li, Dongsheng
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Hangzhou Zhejiang University Press 01.05.2025 Springer Nature B.V
Predmet:	Communication Communications Engineering Computation Computer Hardware Computer memory Computer Science Computer Systems Organization and Communication Networks Electrical Engineering Electronics and Microelectronics Graphics processing units Instrumentation Large language models Networks Parallel processing Research Article Tensors Tensor parallelism Memory efficiency 分布式学习张量并行大规模语言模型机器学习系统 Large language model (LLM) Machine learning system Distributed learning TP183 Long sequence 长序列内存高效
ISSN:	2095-9184, 2095-9230
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Buďte prvý, kto okomentuje tento záznam!