ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Transformers have become keystone models in natural language processing over the past decade. They have achieved great popularity in deep learning applications, but the increasing sizes of the parameter spaces required by transformer models generate a commensurate need to accelerate performance. Nat...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 344 - 355
Hlavní autoři:	Zhai, Yujia, Jiang, Chengquan, Wang, Leyuan, Jia, Xiaoying, Zhang, Shang, Chen, Zizhong, Liu, Xin, Zhu, Yibo
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.05.2023
Témata:	BERT Bit error rate CUTLASS Deep learning Distributed processing Graphics processing units Large Language Models Multi-head Attention Natural Language Processing NVIDIA GPU Optimization methods Technological innovation Training Transformer
ISSN:	1530-2075
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!