Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing

Deep reinforcement learning (DRL) has achieved remarkable success in diverse areas, including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL training, which involves a trial-and-error process, demands considerable time and computational resources. To overcome this ch...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	SC24: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 17
Hlavní autoři:	Yu, Hanfei, Wang, Hao, Tiwari, Devesh, Li, Jian, Park, Seung-Jong
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 17.11.2024
Témata:	Costs Dynamic scheduling Heuristic algorithms High performance computing Monte Carlo methods Processor scheduling Scalability Serverless computing Training Transient analysis
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Deep reinforcement learning (DRL) has achieved remarkable success in diverse areas, including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL training, which involves a trial-and-error process, demands considerable time and computational resources. To overcome this challenge, distributed DRL algorithms and frameworks have been developed to expedite training by leveraging large-scale resources. However, existing distributed DRL solutions rely on synchronous learning with serverful infrastructures, suffering from low training efficiency and overwhelming training costs. This paper proposes Stellaris, the first to introduce a generic asynchronous learning paradigm for distributed DRL training with serverless computing. We devise an importance sampling truncation technique to stabilize DRL training and develop a staleness-aware gradient aggregation method tailored to the dynamic staleness in asynchronous serverless DRL training. Experiments on AWS EC2 regular testbeds and HPC clusters show that Stellaris outperforms existing state-of-the-art DRL baselines by achieving 2.2 \times higher rewards (i.e., training quality) and reducing 41% training costs.
DOI:	10.1109/SC41406.2024.00045