Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing
With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large mode...
Saved in:
| Published in: | Annual Joint Conference of the IEEE Computer and Communications Societies pp. 1 - 10 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
19.05.2025
|
| Subjects: | |
| ISSN: | 2641-9874 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!