Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing

With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large mode...

Full description

Saved in:
Bibliographic Details
Published in:Annual Joint Conference of the IEEE Computer and Communications Societies pp. 1 - 10
Main Authors: Liu, Mengfan, Wang, Wei, Wu, Chuan
Format: Conference Proceeding
Language:English
Published: IEEE 19.05.2025
Subjects:
ISSN:2641-9874
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first