Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing

Deep reinforcement learning (DRL) has achieved remarkable success in diverse areas, including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL training, which involves a trial-and-error process, demands considerable time and computational resources. To overcome this ch...

Full description

Saved in:

Bibliographic Details
Published in:	SC24: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 17
Main Authors:	Yu, Hanfei, Wang, Hao, Tiwari, Devesh, Li, Jian, Park, Seung-Jong
Format:	Conference Proceeding
Language:	English
Published:	IEEE 17.11.2024
Subjects:	Costs Dynamic scheduling Heuristic algorithms High performance computing Monte Carlo methods Processor scheduling Scalability Serverless computing Training Transient analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!