Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing

With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large mode...

Full description

Saved in:

Bibliographic Details
Published in:	Annual Joint Conference of the IEEE Computer and Communications Societies pp. 1 - 10
Main Authors:	Liu, Mengfan, Wang, Wei, Wu, Chuan
Format:	Conference Proceeding
Language:	English
Published:	IEEE 19.05.2025
Subjects:	Bayes methods Computational modeling Costs Optimization Pipelines Prediction algorithms Predictive models Scalability Serverless computing Throughput
ISSN:	2641-9874
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!