FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning With Partitioning and Parallelism of Search Space

Hyper-parameter tuning (HPT) for deep learning (DL) models is prohibitively expensive. Sequential model-based optimization (SMBO) emerges as the state-of-the-art (SOTA) approach to automatically optimize HPT performance due to its heuristic advantages. Unfortunately, focusing on algorithm optimizati...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on parallel and distributed systems Ročník 35; číslo 7; s. 1174 - 1188
Hlavní autori: Li, Xiaqing, Guo, Qi, Zhang, Guangyan, Ye, Siwei, He, Guanhua, Yao, Yiheng, Zhang, Rui, Hao, Yifan, Du, Zidong, Zheng, Weimin
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 01.07.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:1045-9219, 1558-2183
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Hyper-parameter tuning (HPT) for deep learning (DL) models is prohibitively expensive. Sequential model-based optimization (SMBO) emerges as the state-of-the-art (SOTA) approach to automatically optimize HPT performance due to its heuristic advantages. Unfortunately, focusing on algorithm optimization rather than a large-scale parallel HPT system, existing SMBO-based approaches still cannot effectively remove their strong sequential nature, posing two performance problems: (1) extremely low tuning speed and (2) sub-optimal model quality . In this paper, we propose FastTuning, a fast, scalable, and generic system aiming at parallelly accelerating SMBO-based HPT for large DL/ML models. The key is to partition the highly complex search space into multiple smaller sub-spaces, each of which is assigned to and optimized by a different tuning worker in parallel. However, determining the right level of resource allocation to strike a balance between quality and cost remains a challenge. To address this, we further propose NIMBLE, a dynamic scheduling strategy that is specially designed for FastTuning, including (1) Dynamic Elimination Algorithm, (2) Sub-space Re-division, and (3) Posterior Information Sharing. Finally, we incorporate 6 SOTAs (i.e., 3 tuning algorithms and 3 parallel tuning tools) into FastTuning. Experimental results, on ResNet18, VGG19, ResNet50, and ResNet152, show that FastTuning can consistently offer much faster tuning speed (up to <inline-formula><tex-math notation="LaTeX">80\times</tex-math> <mml:math><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="guo-ieq1-3386939.gif"/> </inline-formula>) with better accuracy (up to 4.7% improvement), thereby enabling the application of automatic HPT to real-life DL models.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2024.3386939