Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers

Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In this work, we present a novel four-dimensional hybrid parallel algorithm implemented in a highly scalable, portable,...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	SC24: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 14
Hlavní autoři:	Singh, Siddharth, Singhania, Prajwal, Ranjan, Aditya, Kirchenbauer, John, Geiping, Jonas, Wen, Yuxin, Jain, Neel, Hans, Abhimanyu, Shu, Manli, Tomar, Aditya, Goldstein, Tom, Bhatele, Abhinav
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 17.11.2024
Témata:	asynchrony collective communication Computational modeling GPGPUs High performance computing Kernel Large language models Optimization Parallel algorithms parallel training Supercomputers Training Training data Transformers
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!