Zobraziť v EDS

GPU Computing in Chapel: Application to Tree-Search Algorithms

Uložené v:

Podrobná bibliografia
Názov:	GPU Computing in Chapel: Application to Tree-Search Algorithms
Autori:	Helbecque, Guillaume, Krishnasamy, Ezhilmathi, Melab, Nouredine, Bouvry, Pascal
Prispievatelia:	Helbecque, Guillaume
Informácie o vydavateľovi:	2024.
Rok vydania:	2024
Predmety:	Backtracking, Chapel, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], GPU Programming, N-Queens, Tree-Search
Popis:	We investigate the design and implementation of a GPU-accelerated tree-search algorithm in Chapel. The latter is motivated by the emerging GPU support of Chapel, which stands as an alternative to traditional low-level programming environments, such as CUDA. The algorithm is based on a general multi-pool approach equipped with a load balancing mechanism. It is experimented on the N-Queens problem and compared to a CUDA baseline implementation using up to 8 GPUs. Both Nvidia and AMD GPU architectures are considered. We demonstrate that the Chapel's high level of abstraction causes a performance loss of only 10% in our experiments, and our algorithm achieves up to 75% of the linear speed-up in best scenarios.
Druh dokumentu:	Conference object
Jazyk:	English
Prístupová URL adresa:	https://hal.science/hal-04551844v1
Prístupové číslo:	edsair.dedup.wf.002..80dafcfaf1d70a63971b9a8fee2f97d5
Databáza:	OpenAIRE

Popis
Abstrakt:	We investigate the design and implementation of a GPU-accelerated tree-search algorithm in Chapel. The latter is motivated by the emerging GPU support of Chapel, which stands as an alternative to traditional low-level programming environments, such as CUDA. The algorithm is based on a general multi-pool approach equipped with a load balancing mechanism. It is experimented on the N-Queens problem and compared to a CUDA baseline implementation using up to 8 GPUs. Both Nvidia and AMD GPU architectures are considered. We demonstrate that the Chapel's high level of abstraction causes a performance loss of only 10% in our experiments, and our algorithm achieves up to 75% of the linear speed-up in best scenarios.