Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing

Deep reinforcement learning (DRL) has achieved remarkable success in diverse areas, including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL training, which involves a trial-and-error process, demands considerable time and computational resources. To overcome this ch...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SC24: International Conference for High Performance Computing, Networking, Storage and Analysis S. 1 - 17
Hauptverfasser:	Yu, Hanfei, Wang, Hao, Tiwari, Devesh, Li, Jian, Park, Seung-Jong
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 17.11.2024
Schlagworte:	Costs Dynamic scheduling Heuristic algorithms High performance computing Monte Carlo methods Processor scheduling Scalability Serverless computing Training Transient analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!