An Empirical Study on Adaptive Inference for Pretrained Language Model

Adaptive inference has been proven to improve bidirectional encoder representations from transformers (BERT)'s inference speed with minimal loss of accuracy. However, current work only focuses on the BERT model and lacks exploration of other pretrained language models (PLMs). Therefore, this ar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems Jg. 34; H. 8; S. 4321 - 4331
Hauptverfasser:	Liu, Weijie, Zhao, Xin, Zhao, Zhe, Ju, Qi, Yang, Xuefeng, Lu, Wei
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	United States IEEE 01.08.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Acceleration Adaptation models Adaptive inference Benchmarks bidirectional encoder representations from transformers (BERT) Bit error rate Coders Computational modeling distillation FastPLM Inference Inference mechanisms Labels Language Mathematical models pretrained language model (PLM) Task analysis Transformers
ISSN:	2162-237X, 2162-2388, 2162-2388
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Adaptive inference has been proven to improve bidirectional encoder representations from transformers (BERT)'s inference speed with minimal loss of accuracy. However, current work only focuses on the BERT model and lacks exploration of other pretrained language models (PLMs). Therefore, this article conducts an empirical study on the application of adaptive inference mechanism in various PLMs, including generative pretraining (GPT), GCNN, ALBERT, and TinyBERT. This mechanism is verified on both English and Chinese benchmarks, and experimental results demonstrated that it is able to speed up by a wide range from 1 to 10 times if given different speed thresholds. In addition, its application on ALBERT shows that adaptive inference can work with parameter sharing, achieving model compression and acceleration simultaneously, while the application on TinyBERT proves that it can further accelerate the distilled small model. As for the problem that too many labels make adaptive inference invalid, this article also proposes a solution, namely label reduction. Finally, this article open-sources an easy-to-use toolkit called FastPLM to help developers adopt pretrained models with adaptive inference capabilities in their applications.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2021.3114188