Task-Based HPC in the Cloud: Price-Performance Analysis of N-Body Simulations with StarPU

Public cloud environments present significant challenges for traditional High Performance Computing (HPC) applications due to infrastructure limitations that differ substantially from dedicated HPC systems. Unlike traditional HPC clusters optimized for tightly coupled parallel workloads, cloud platf...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the IEEE International Conference on Cloud Engineering s. 25 - 35
Hlavní autoři:	Vanz, Nicolas, Munhoz, Vanderlei, Castro, Marcio, Pilla, Laercio Lima, Aumage, Olivier
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 23.09.2025
Témata:	Cloud computing Computational modeling HPC MPI Physics Processor scheduling Runtime Schedules StarPU Task Scheduling Tuning Virtualization Web services
ISSN:	2694-0825
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Public cloud environments present significant challenges for traditional High Performance Computing (HPC) applications due to infrastructure limitations that differ substantially from dedicated HPC systems. Unlike traditional HPC clusters optimized for tightly coupled parallel workloads, cloud platforms were designed primarily for web services and data processing applications. Key obstacles include high-latency networks, hardware virtualization overhead, and limited availability of specialized accelerators, all of which can severely impact the performance of compute-intensive applications such as physics simulations. This study investigates the feasibility of running HPC workloads on public cloud infrastructure using standard and cost-effective instance configurations rather than expensive specialized "HPC" offerings. We deploy heterogeneous clusters on Amazon Web Services using the HPC@Cloud Toolkit, incorporating various instance types, including GPU-accelerated nodes with different computational capabilities. Our evaluation focuses on N-body simulations implemented using a task-based parallel programming model, leveraging the StarPU runtime system to dynamically schedule computational tasks across various processing units. Our experimental results demonstrate three key findings: (1) smaller GPU-equipped instances (g6.2xlarge) achieve performance comparable to larger instances while costing approximately one-sixth the price, challenging conventional scaling assumptions for cloud-based HPC; (2) strategic GPU utilization yields up to 8.2× performance improvements over CPU-only configurations while reducing total execution costs by 24.4×; and (3) while task-based programming models effectively address network limitations through dynamic scheduling, complex tree-based algorithms like TBFMM face significant optimization challenges in cloud environments due to load balancing issues and expensive parameter tuning requirements. These findings provide practical guidance for researchers and practitioners seeking cost-effective cloud HPC deployments, demonstrating that commodity cloud infrastructures can be viable for regular computational workloads but require careful algorithmic-resource matching for optimal efficiency.
ISSN:	2694-0825
DOI:	10.1109/IC2E65552.2025.00009