CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism

Deep Learning (DL) model sizes are increasing at a rapid pace, as larger models typically offer better statistical performance. Modern Large Language Models (LLMs) and image processing models contain billions of trainable parameters. Training such massive neural networks incurs significant memory re...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings - International Conference on High Performance Computing s. 76 - 86
Hlavní autoři:	Dreuning, Henk, Verstoep, Kees, Bal, Henri E., van Nieuwpoort, Rob V.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 18.12.2023
Témata:	Costs Deep Learning GPU Graphics processing units HPC Hybrid Parallelism Memory Memory management Predictive models Tensors Throughput Training
ISSN:	2640-0316
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!