Scaling Up Optuna: P2P Distributed Hyperparameters Optimization

ABSTRACT In machine learning (ML), hyperparameter optimization (HPO) is the process of choosing a tuple of values that ensures an efficient deployment and training of an AI model. In practice, HPO not only applies to ML tuning but can also be used to tune complex numerical simulations. In this conte...

Full description

Saved in:

Bibliographic Details
Published in:	Concurrency and computation Vol. 37; no. 4-5
Main Author:	Cudennec, Loïc
Format:	Journal Article
Language:	English
Published:	Hoboken, USA John Wiley & Sons, Inc 28.02.2025 Wiley Subscription Services, Inc Wiley
Series:	e70008
Subjects:	Algorithms Computer Science Computer simulation distributed computing Energy consumption high‐performance computing hyperparameters optimization Machine learning Numerical models operational research Optimization optuna Parameters peer‐to‐peer Scaling up Simulation Tuning Energy consumption Operational Research Peer-to-peer High-Performance Computing Optuna Distributed Computing Machine Learning Hyperparameters Optimization
ISSN:	1532-0626, 1532-0634
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	ABSTRACT In machine learning (ML), hyperparameter optimization (HPO) is the process of choosing a tuple of values that ensures an efficient deployment and training of an AI model. In practice, HPO not only applies to ML tuning but can also be used to tune complex numerical simulations. In this context, a numerical model of a given object is created to be used in realistic simulations. This model is defined by a set of values describing properties such as the geometry of the object or other unknown parameters related to physical quantities. While HPO for ML usually requires finding a few parameters, a numerical model can involve the tuning of more than a hundred parameters. As a consequence, a large number of tuples have to be explored and evaluated before finding a relevant solution, offering new challenges in high‐performance computing for efficiently driving the optimization. In this work we rely on the Optuna HPO framework, primarily designed for ML tasks and including state‐of‐the‐art sampling and pruning algorithms. We report on its use to optimize a complex numerical model onto a 1024‐core machine. We suggest 1.5M tuples and evaluate 5M simulations using different Optuna‐distributed layouts to build several tradeoffs between performance and energy consumption metrics. In order to further scale up the optimization process onto resources, we introduce OptunaP2P, an extension of Optuna based on the peer‐to‐peer paradigm. This allows to remove any bottleneck in the management of the shared knowledge between optimization processes. With OptunaP2P, we were able to compute up to 3 times faster compared to the regular Optuna‐distributed implementation and to obtain close‐to‐similar results in terms of quality in this reduced time‐frame.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.70008