Memory-efficient tensor parallelism for long-sequence Transformer training

Transformer-based models like large language models (LLMs) have attracted significant attention in recent years due to their superior performance. A long sequence of input tokens is essential for industrial LLMs to provide better user services. However, memory consumption increases quadratically wit...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Frontiers of information technology & electronic engineering Ročník 26; číslo 5; s. 770 - 787
Hlavní autoři:	Liang, Peng, Qiao, Linbo, Shi, Yanqi, Zheng, Hao, Tang, Yu, Li, Dongsheng
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Hangzhou Zhejiang University Press 01.05.2025 Springer Nature B.V
Témata:	Communication Communications Engineering Computation Computer Hardware Computer memory Computer Science Computer Systems Organization and Communication Networks Electrical Engineering Electronics and Microelectronics Graphics processing units Instrumentation Large language models Networks Parallel processing Research Article Tensors Tensor parallelism Memory efficiency 分布式学习张量并行大规模语言模型机器学习系统 Large language model (LLM) Machine learning system Distributed learning TP183 Long sequence 长序列内存高效
ISSN:	2095-9184, 2095-9230
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!