Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices

A pruning-based AutoML framework for run-time reconfigurability, namely RT 3 , is proposed in this work. This enables Transformer-based large Natural Language Processing (NLP) models to be efficiently executed on resource-constrained mobile devices and reconfigured (i.e., switching models for dynami...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2021 58th ACM/IEEE Design Automation Conference (DAC) s. 1003 - 1008
Hlavní autoři: Song, Yuhong, Jiang, Weiwen, Li, Bingbing, Qi, Panjie, Zhuge, Qingfeng, Sha, Edwin Hsing-Mean, Dasgupta, Sakyasingha, Shi, Yiyu, Ding, Caiwen
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 05.12.2021
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:A pruning-based AutoML framework for run-time reconfigurability, namely RT 3 , is proposed in this work. This enables Transformer-based large Natural Language Processing (NLP) models to be efficiently executed on resource-constrained mobile devices and reconfigured (i.e., switching models for dynamic hardware conditions) at run-time. Such reconfigurability is the key to save energy for battery-powered mobile devices, which widely use dynamic voltage and frequency scaling (DVFS) technique for hardware reconfiguration to prolong battery life. In this work, we creatively explore a hybrid block-structured pruning (BP) and pattern pruning (PP) for Transformer-based models and first attempt to combine hardware and software reconfiguration to maximally save energy for battery-powered mobile devices. Specifically, RT 3 integrates two-level optimizations: First, it utilizes an efficient BP as the first-step compression for resource-constrained mobile devices; then, RT 3 heuristically generates a shrunken search space based on the first level optimization and searches multiple pattern sets with diverse sparsity for PP via reinforcement learning to support lightweight software reconfiguration, which corresponds to available frequency levels of DVFS (i.e., hardware reconfiguration). At run-time, RT 3 can switch the lightweight pattern sets within 45ms to guarantee the required real-time constraint at different frequency levels. Results further show that RT 3 can prolong battery life over 4\times improvement with less than 1% accuracy loss for Transformer and 1.5% score decrease for DistilBERT.
DOI:10.1109/DAC18074.2021.9586295