Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing

With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large mode...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Annual Joint Conference of the IEEE Computer and Communications Societies s. 1 - 10
Hlavní autoři:	Liu, Mengfan, Wang, Wei, Wu, Chuan
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 19.05.2025
Témata:	Bayes methods Computational modeling Costs Optimization Pipelines Prediction algorithms Predictive models Scalability Serverless computing Throughput
ISSN:	2641-9874
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!