QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows
Saved in:
| Title: | QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows |
|---|---|
| Authors: | Wu, Hao, Deng, Junxiao, Fan, Hao, Ibrahim, Shadi, Wu, Song, Jin, Hai |
| Contributors: | Huazhong University of Science and Technology Wuhan (HUST), Design and Implementation of Autonomous Distributed Systems (MYRIADS), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SYSTÈMES LARGE ÉCHELLE (IRISA-D1), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT) |
| Source: | 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) IPDPS - 2023 IEEE International Parallel and Distributed Processing Symposium https://hal.science/hal-04389016 IPDPS - 2023 IEEE International Parallel and Distributed Processing Symposium, May 2023, St. Petersburg, United States. pp.886-896, ⟨10.1109/IPDPS54959.2023.00093⟩ |
| Publisher Information: | HAL CCSD IEEE |
| Publication Year: | 2023 |
| Collection: | Université de Rennes 1: Publications scientifiques (HAL) |
| Subject Terms: | serverless computing, distributed machine learning, resource provisioning, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] |
| Subject Geographic: | St. Petersburg, United States |
| Description: | International audience ; Machine Learning (ML) workflows are increasingly deployed on serverless computing platforms to benefit from their elasticity and fine-grain pricing. Proper resource allocation is crucial to achieve fast and cost-efficient execution of serverless ML workflows (specially for hyperparameter tuning and model training). Unfortunately, existing resource allocation methods are static, treat functions equally, and rely on offline prediction, which limit their efficiency. In this paper, we introduce CE-scaling - a Cost-Efficient autoscaling framework for serverless ML workflows. During the hyperparameter tuning, CE-scaling partitions resources across stages according to their exact usage to minimize resource waste. Moreover, it incorporates an online prediction method to dynamically adjust resources during model training. We implement and evaluate CE-scaling on AWS Lambda using various ML models. Evaluation results show that compared to state-of-the-art static resource allocation methods, CE-scaling can reduce the job completion time and the monetary cost by up to 63% and 41% for hyperparameter tuning, respectively; and by up to 58% and 38% for model training. |
| Document Type: | conference object |
| Language: | English |
| DOI: | 10.1109/IPDPS54959.2023.00093 |
| Availability: | https://hal.science/hal-04389016 https://hal.science/hal-04389016v1/document https://hal.science/hal-04389016v1/file/IPDPS-2023-CR.pdf https://doi.org/10.1109/IPDPS54959.2023.00093 |
| Rights: | http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess |
| Accession Number: | edsbas.3C1AE584 |
| Database: | BASE |
| Abstract: | International audience ; Machine Learning (ML) workflows are increasingly deployed on serverless computing platforms to benefit from their elasticity and fine-grain pricing. Proper resource allocation is crucial to achieve fast and cost-efficient execution of serverless ML workflows (specially for hyperparameter tuning and model training). Unfortunately, existing resource allocation methods are static, treat functions equally, and rely on offline prediction, which limit their efficiency. In this paper, we introduce CE-scaling - a Cost-Efficient autoscaling framework for serverless ML workflows. During the hyperparameter tuning, CE-scaling partitions resources across stages according to their exact usage to minimize resource waste. Moreover, it incorporates an online prediction method to dynamically adjust resources during model training. We implement and evaluate CE-scaling on AWS Lambda using various ML models. Evaluation results show that compared to state-of-the-art static resource allocation methods, CE-scaling can reduce the job completion time and the monetary cost by up to 63% and 41% for hyperparameter tuning, respectively; and by up to 58% and 38% for model training. |
|---|---|
| DOI: | 10.1109/IPDPS54959.2023.00093 |
Nájsť tento článok vo Web of Science