Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing

With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large mode...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Annual Joint Conference of the IEEE Computer and Communications Societies s. 1 - 10
Hlavní autori:	Liu, Mengfan, Wang, Wei, Wu, Chuan
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 19.05.2025
Predmet:	Bayes methods Computational modeling Costs Optimization Pipelines Prediction algorithms Predictive models Scalability Serverless computing Throughput
ISSN:	2641-9874
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Buďte prvý, kto okomentuje tento záznam!