Celerity-RSim: Porting Light Propagation Simulation to Accelerator Clusters Using a High-Level API

Time-of-Flight (ToF) camera systems are increasingly capable of analyzing larger 3D spaces and providing more detailed and precise results. To increase the speed-to-solution during development, testing and validation of such systems, light propagation simulation is employed. One such simulation, RSi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International journal of parallel programming Ročník 53; číslo 3; s. 17
Hlavní autoři: Thoman, Peter, Gschwandtner, Philipp, Molina Heredina, Facundo, Fahringer, Thomas
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.06.2025
Springer Nature B.V
Témata:
ISSN:0885-7458, 1573-7640
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Time-of-Flight (ToF) camera systems are increasingly capable of analyzing larger 3D spaces and providing more detailed and precise results. To increase the speed-to-solution during development, testing and validation of such systems, light propagation simulation is employed. One such simulation, RSim, was previously performed on single workstations, however, the increase in detail required for newer ToF hardware necessitates cluster-level parallelism in order to maintain an experiment latency which enables productive design work. Celerity is a high-level parallel API and runtime system for clusters of accelerators intended to simplify the development of domain science applications. It automatically manages data and work distribution, while also transparently enabling asynchronous compute and communication overlapping. In this paper, we present a use case study of porting the full RSim application to GPU clusters using the Celerity system. In order to improve scalability, a new parallelization scheme was employed for the core simulation task, and Celerity was extended with a high-level split constraints feature which enables this scheme. We present strong- and weak-scaling experiments for the resulting application on three accelerator clusters and up to 128 GPUs, and also evaluate the relative programming effort required to distribute the application on multiple GPUs using different APIs.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-7458
1573-7640
DOI:10.1007/s10766-025-00787-2