Machine learning inference serving models in serverless computing: a survey

Serverless computing has attracted many researchers with features such as scalability and optimization of operating costs, no need to manage infrastructures, and build programs at a higher speed. Serverless computing can be used for real-time machine learning (ML) prediction using serverless inferen...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Computing Ročník 107; číslo 1; s. 47
Hlavní autori: Aslani, Akram, Ghobaei-Arani, Mostafa
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Vienna Springer Vienna 01.01.2025
Springer Nature B.V
Predmet:
ISSN:0010-485X, 1436-5057
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Serverless computing has attracted many researchers with features such as scalability and optimization of operating costs, no need to manage infrastructures, and build programs at a higher speed. Serverless computing can be used for real-time machine learning (ML) prediction using serverless inference functions. Deploying an ML serverless inference function involves building a compute resource, deploying an ML model, network infrastructure, and permissions to call the inference function. However, the subject of machine learning inference (MLI) has challenges such as resource management, delay and response time, large and complex models, and security and privacy, not many studies have been conducted in this field. This comprehensive literature review article examines the recent developments in MLI in serverless computing environments. The mechanisms presented in the taxonomy can be summarized in four categories: service level objective SLO-aware, acceleration-aware, framework-aware, and latency-aware. In each category, different methods and algorithms used to optimize inference in serverless environments have been examined along with their advantages and disadvantages. We show that acceleration-aware methods focus on the optimal use of computing resources, and framework-aware methods play an important role in improving system efficiency and scalability by examining different frameworks for inference in serverless environments. Also, SLO-aware and Latency-aware methods, considering time limits and service level agreement, help provide quality and reliable inference in serverless environments. Finally, this article presents a vision of future challenges and opportunities in this field and provides solutions for future research in the field of MLI in serverless.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0010-485X
1436-5057
DOI:10.1007/s00607-024-01377-9