Scaling Up Optuna: P2P Distributed Hyperparameters Optimization
ABSTRACT In machine learning (ML), hyperparameter optimization (HPO) is the process of choosing a tuple of values that ensures an efficient deployment and training of an AI model. In practice, HPO not only applies to ML tuning but can also be used to tune complex numerical simulations. In this conte...
Saved in:
| Published in: | Concurrency and computation Vol. 37; no. 4-5 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
Hoboken, USA
John Wiley & Sons, Inc
28.02.2025
Wiley Subscription Services, Inc Wiley |
| Series: | e70008 |
| Subjects: | |
| ISSN: | 1532-0626, 1532-0634 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | ABSTRACT
In machine learning (ML), hyperparameter optimization (HPO) is the process of choosing a tuple of values that ensures an efficient deployment and training of an AI model. In practice, HPO not only applies to ML tuning but can also be used to tune complex numerical simulations. In this context, a numerical model of a given object is created to be used in realistic simulations. This model is defined by a set of values describing properties such as the geometry of the object or other unknown parameters related to physical quantities. While HPO for ML usually requires finding a few parameters, a numerical model can involve the tuning of more than a hundred parameters. As a consequence, a large number of tuples have to be explored and evaluated before finding a relevant solution, offering new challenges in high‐performance computing for efficiently driving the optimization. In this work we rely on the Optuna HPO framework, primarily designed for ML tasks and including state‐of‐the‐art sampling and pruning algorithms. We report on its use to optimize a complex numerical model onto a 1024‐core machine. We suggest 1.5M tuples and evaluate 5M simulations using different Optuna‐distributed layouts to build several tradeoffs between performance and energy consumption metrics. In order to further scale up the optimization process onto resources, we introduce OptunaP2P, an extension of Optuna based on the peer‐to‐peer paradigm. This allows to remove any bottleneck in the management of the shared knowledge between optimization processes. With OptunaP2P, we were able to compute up to 3 times faster compared to the regular Optuna‐distributed implementation and to obtain close‐to‐similar results in terms of quality in this reduced time‐frame. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1532-0626 1532-0634 |
| DOI: | 10.1002/cpe.70008 |