FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning With Partitioning and Parallelism of Search Space

Hyper-parameter tuning (HPT) for deep learning (DL) models is prohibitively expensive. Sequential model-based optimization (SMBO) emerges as the state-of-the-art (SOTA) approach to automatically optimize HPT performance due to its heuristic advantages. Unfortunately, focusing on algorithm optimizati...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems Vol. 35; no. 7; pp. 1174 - 1188
Main Authors:	Li, Xiaqing, Guo, Qi, Zhang, Guangyan, Ye, Siwei, He, Guanhua, Yao, Yiheng, Zhang, Rui, Hao, Yifan, Du, Zidong, Zheng, Weimin
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.07.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Computational modeling Deep learning distributed hyper-parameter tuning (HPT) system Dynamic scheduling Heuristic algorithms Optimization parallel computing Parallel processing Parameters Resource allocation Task analysis Tuning
ISSN:	1045-9219, 1558-2183
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Hyper-parameter tuning (HPT) for deep learning (DL) models is prohibitively expensive. Sequential model-based optimization (SMBO) emerges as the state-of-the-art (SOTA) approach to automatically optimize HPT performance due to its heuristic advantages. Unfortunately, focusing on algorithm optimization rather than a large-scale parallel HPT system, existing SMBO-based approaches still cannot effectively remove their strong sequential nature, posing two performance problems: (1) extremely low tuning speed and (2) sub-optimal model quality . In this paper, we propose FastTuning, a fast, scalable, and generic system aiming at parallelly accelerating SMBO-based HPT for large DL/ML models. The key is to partition the highly complex search space into multiple smaller sub-spaces, each of which is assigned to and optimized by a different tuning worker in parallel. However, determining the right level of resource allocation to strike a balance between quality and cost remains a challenge. To address this, we further propose NIMBLE, a dynamic scheduling strategy that is specially designed for FastTuning, including (1) Dynamic Elimination Algorithm, (2) Sub-space Re-division, and (3) Posterior Information Sharing. Finally, we incorporate 6 SOTAs (i.e., 3 tuning algorithms and 3 parallel tuning tools) into FastTuning. Experimental results, on ResNet18, VGG19, ResNet50, and ResNet152, show that FastTuning can consistently offer much faster tuning speed (up to <inline-formula><tex-math notation="LaTeX">80\times</tex-math> <mml:math><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="guo-ieq1-3386939.gif"/> </inline-formula>) with better accuracy (up to 4.7% improvement), thereby enabling the application of automatic HPT to real-life DL models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2024.3386939