An Empirical Study on Adaptive Inference for Pretrained Language Model

Adaptive inference has been proven to improve bidirectional encoder representations from transformers (BERT)'s inference speed with minimal loss of accuracy. However, current work only focuses on the BERT model and lacks exploration of other pretrained language models (PLMs). Therefore, this ar...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transaction on neural networks and learning systems Ročník 34; číslo 8; s. 4321 - 4331
Hlavní autori: Liu, Weijie, Zhao, Xin, Zhao, Zhe, Ju, Qi, Yang, Xuefeng, Lu, Wei
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States IEEE 01.08.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:2162-237X, 2162-2388, 2162-2388
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Adaptive inference has been proven to improve bidirectional encoder representations from transformers (BERT)'s inference speed with minimal loss of accuracy. However, current work only focuses on the BERT model and lacks exploration of other pretrained language models (PLMs). Therefore, this article conducts an empirical study on the application of adaptive inference mechanism in various PLMs, including generative pretraining (GPT), GCNN, ALBERT, and TinyBERT. This mechanism is verified on both English and Chinese benchmarks, and experimental results demonstrated that it is able to speed up by a wide range from 1 to 10 times if given different speed thresholds. In addition, its application on ALBERT shows that adaptive inference can work with parameter sharing, achieving model compression and acceleration simultaneously, while the application on TinyBERT proves that it can further accelerate the distilled small model. As for the problem that too many labels make adaptive inference invalid, this article also proposes a solution, namely label reduction. Finally, this article open-sources an easy-to-use toolkit called FastPLM to help developers adopt pretrained models with adaptive inference capabilities in their applications.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2021.3114188