Celerity-RSim: Porting Light Propagation Simulation to Accelerator Clusters Using a High-Level API

Time-of-Flight (ToF) camera systems are increasingly capable of analyzing larger 3D spaces and providing more detailed and precise results. To increase the speed-to-solution during development, testing and validation of such systems, light propagation simulation is employed. One such simulation, RSi...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International journal of parallel programming Ročník 53; číslo 3; s. 17
Hlavní autoři:	Thoman, Peter, Gschwandtner, Philipp, Molina Heredina, Facundo, Fahringer, Thomas
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Springer US 01.06.2025 Springer Nature B.V
Témata:	Application programming interface Clusters Computer Science Processor Architectures Simulation Software Engineering/Programming and Operating Systems Theory of Computation Distributed memory Productivity Celerity Parallel programming Light propagation Multi-gpu SYCL HPC GPU computing
ISSN:	0885-7458, 1573-7640
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Time-of-Flight (ToF) camera systems are increasingly capable of analyzing larger 3D spaces and providing more detailed and precise results. To increase the speed-to-solution during development, testing and validation of such systems, light propagation simulation is employed. One such simulation, RSim, was previously performed on single workstations, however, the increase in detail required for newer ToF hardware necessitates cluster-level parallelism in order to maintain an experiment latency which enables productive design work. Celerity is a high-level parallel API and runtime system for clusters of accelerators intended to simplify the development of domain science applications. It automatically manages data and work distribution, while also transparently enabling asynchronous compute and communication overlapping. In this paper, we present a use case study of porting the full RSim application to GPU clusters using the Celerity system. In order to improve scalability, a new parallelization scheme was employed for the core simulation task, and Celerity was extended with a high-level split constraints feature which enables this scheme. We present strong- and weak-scaling experiments for the resulting application on three accelerator clusters and up to 128 GPUs, and also evaluate the relative programming effort required to distribute the application on multiple GPUs using different APIs.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0885-7458 1573-7640
DOI:	10.1007/s10766-025-00787-2