Suchergebnisse - parallel amd distributed computing

1

Wird geladen …

Automated parallel execution of distributed task graphs with FPGA clusters von de Haro Ruiz, Juan Miguel, Martínez, Carlos Álvarez, Jiménez-González, Daniel, Martorell, Xavier, Ueno, Tomohiro, Sano, Kentaro, Ringlein, Burkhard, Abel, François, Weiss, Beat

ISSN: 0167-739X

Veröffentlicht: Elsevier B.V 01.11.2024

Veröffentlicht in Future generation computer systems (01.11.2024)
“… Over the years, Field Programmable Gate Arrays (FPGA) have been gaining popularity in the High Performance Computing (HPC …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
2

Wird geladen …

PRNGine: Massively Parallel Pseudo-Random Number Generation and Probability Distribution Approximations on AMD AI Engines von Bouaziz, Mohamed, Fahmy, Suhaib A.

ISSN: 2995-066X

Veröffentlicht: IEEE 03.06.2025

Veröffentlicht in 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (03.06.2025)
“… Generating large volumes of random numbers is essential for high-performance computing applications such as Monte Carlo simulations, machine learning, and dynamic game-play …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
3

Wird geladen …

StreamMR: An Optimized MapReduce Framework for AMD GPUs von Elteir, M., Heshan Lin, Wu-chun Feng, Scogland, T.

ISBN: 1457718758, 9781457718755

ISSN: 1521-9097

Veröffentlicht: IEEE 01.12.2011

Veröffentlicht in 2011 IEEE 17th International Conference on Parallel and Distributed Systems (01.12.2011)
“… MapReduce is a programming model from Google that facilitates parallel processing on a cluster of thousands of commodity computers …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
4

Wird geladen …

Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs von Sfiligoi, Igor, Belli, Emily A, Candy, Jeff, Budiardja, Reuben D

ISSN: 2331-8422

Veröffentlicht: Ithaca Cornell University Library, arXiv.org 17.05.2023

Veröffentlicht in arXiv.org (17.05.2023)
“… Recent exascale HPC systems are, however, introducing GPUs from other vendors, e.g. with the AMD GPU-based OLCF Frontier system just becoming available …”

Volltext

Paper

Zu den Favoriten

Gespeichert in:
5

Wird geladen …

A method for decompilation of AMD GCN kernels to OpenCL von Mihajlenko, K I, Lukin, M A, Stankevich, A S

ISSN: 2331-8422

Veröffentlicht: Ithaca Cornell University Library, arXiv.org 16.07.2021

Veröffentlicht in arXiv.org (16.07.2021)
“… They are available for many hardware architectures and programming languages. However, none of the existing decompilers support modern AMD GPU architectures such as AMD GCN and RDNA. Purpose …”

Volltext

Paper

Zu den Favoriten

Gespeichert in:
6

Wird geladen …

Efficient and Distributed Computation of Electron Repulsion Integrals on AMD AI Engines von Menzel, Johannes, Plessl, Christian

ISSN: 2576-2621

Veröffentlicht: IEEE 04.05.2025

Veröffentlicht in Proceedings ... Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Online) (04.05.2025)
“… Computing electron repulsion integrals (ERIs) is the major computational bottleneck of many quantum mechanical simulation methods, requiring trillions of ERI evaluations per time step …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
7

Wird geladen …

Distributed computation of the critical path from execution traces von Denys, Pierre‐Frédérick, Fournier, Quentin, Dagenais, Michel R.

ISSN: 0038-0644, 1097-024X

Veröffentlicht: Bognor Regis Wiley Subscription Services, Inc 01.08.2023

Veröffentlicht in Software, practice & experience (01.08.2023)
“… Due to the ever‐increasing number of computer nodes in distributed systems, efficient and effective tools have become crucial for their analysis …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
8

Wird geladen …

Performance portable Vlasov code with C++ parallel algorithm von Asahi, Yuuichi, Padioleau, Thomas, Latu, Guillaume, Bigot, Julien, Grandgirard, Virginie, Obrejan, Kevin

ISSN: 2831-3909

Veröffentlicht: IEEE 01.11.2022

Veröffentlicht in IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (Online) (01.11.2022)
“… parallel algorithm to run across multiple CPUs and GPUs. Relying on the language standard parallelism stdpar and proposed language standard multi-dimensional array …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
9

Wird geladen …

GPU-Accelerated Tree-Search in Chapel Versus CUDA and HIP von Helbecque, Guillaume, Krishnasamy, Ezhilmathi, Melab, Nouredine, Bouvry, Pascal

Veröffentlicht: IEEE 27.05.2024

Veröffentlicht in 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (27.05.2024)
“… In the context of exascale programming, the PGAS-based Chapel is among the rare languages targeting the holistic handling of high-performance computing issues including the productivity-aware …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
10

Wird geladen …

TaPaSCo-AIE: An Open-Source Framework for Streaming-Based Heterogeneous Acceleration Using AMD AI Engines von Heinz, Carsten, Kalkhof, Torben, Lavan, Yannick, Koch, Andreas

Veröffentlicht: IEEE 27.05.2024

Veröffentlicht in 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (27.05.2024)
“… AMD AI Engines (AIEs) extend the design space and open up new options for coarse-grained processing in re-configurable accelerators …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
11

Wird geladen …

On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors von Ouro, Pablo, Lopez-Novoa, Unai, Guest, Martyn

ISSN: 2331-8422

Veröffentlicht: Ithaca Cornell University Library, arXiv.org 12.10.2020

Veröffentlicht in arXiv.org (12.10.2020)
“… No area of computing is hungrier for performance than High Performance Computing (HPC …”

Volltext

Paper

Zu den Favoriten

Gespeichert in:
12

Wird geladen …

A Performance Model for GPUs with Caches von Thanh Tuan Dao, Jungwon Kim, Sangmin Seo, Egger, Bernhard, Jaejin Lee

ISSN: 1045-9219, 1558-2183

Veröffentlicht: New York IEEE 01.07.2015

Veröffentlicht in IEEE transactions on parallel and distributed systems (01.07.2015)
“… To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
13

Wird geladen …

BCSR on GPU: A Way Forward Extreme-scale Graph Processing on Accelerator-enabled Frontier Supercomputer von Sattar, Naw Safrin, Lu, Hao, Wang, Feiyi

Veröffentlicht: IEEE 17.11.2024

Veröffentlicht in SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (17.11.2024)
“… Handling large graphs in a distributed environment requires effective partitioning across processors and efficient management of local partitions …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
14

Wird geladen …

Dissecting the Software-Based Measurement of CPU Energy Consumption: A Comparative Analysis von Raffin, Guillaume, Trystram, Denis

ISSN: 1045-9219, 1558-2183

Veröffentlicht: IEEE 01.01.2025

Veröffentlicht in IEEE transactions on parallel and distributed systems (01.01.2025)
“… (and more) without the need for additional hardware. Since 2017, it is available on most x86 processors, including AMD processors …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
15

Wird geladen …

Cloud Colonography: Distributed Medical Testbed over Cloud von Motai, Yuichi, Henderson, Eric, Siddique, Nahian Alam, Yoshida, Hiroyuki

ISSN: 2168-7161, 2372-0018

Veröffentlicht: Piscataway IEEE Computer Society 01.04.2020

Veröffentlicht in IEEE transactions on cloud computing (01.04.2020)
“… The proposed AMD has the potential to play a role of the core classifier in the cloud computing framework …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
16

Wird geladen …

Integer Sum Reduction with OpenMP on an AMD MI100 GPU von Jin, Zheming, Vetter, Jeffrey S.

ISBN: 9781665497480

Veröffentlicht: IEEE 01.05.2022

Veröffentlicht in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (01.05.2022)
“… Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
17

Wird geladen …

A Fine-grained Prefetching Scheme for DGEMM Kernels on GPU with Auto-tuning Compatibility von Li, Jialin, Ye, Huang, Tian, Shaobo, Li, Xinyuan, Zhang, Jian

ISSN: 1530-2075

Veröffentlicht: IEEE 01.05.2022

Veröffentlicht in Proceedings - IEEE International Parallel and Distributed Processing Symposium (01.05.2022)
“… General Matrix Multiplication (GEMM) is one of the fundamental kernels for scientific and high-performance computing …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
18

Wird geladen …

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures von Byunghyun Jang, Schaa, Dana, Mistry, Perhaad, Kaeli, David

ISSN: 1045-9219, 1558-2183

Veröffentlicht: New York IEEE 01.01.2011

Veröffentlicht in IEEE transactions on parallel and distributed systems (01.01.2011)
“… The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
19

Wird geladen …

Parallel breadth-first search on distributed memory systems von Buluç, Aydin, Madduri, Kamesh

ISBN: 145030771X, 9781450307710

ISSN: 2167-4329

Veröffentlicht: New York, NY, USA ACM 12.11.2011

Veröffentlicht in 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (12.11.2011)
“… Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
20

Wird geladen …

Multi-BSP vs. BSP: A Case of Study for Dell AMD Multicores von Trabes, Guillermo, Gil-Costa, Veronica, Printista, Marcela, Marin, Mauricio

ISSN: 2377-5750

Veröffentlicht: IEEE 01.03.2018

Veröffentlicht in Proceedings - Euromicro Workshop on Parallel and Distributed Processing (01.03.2018)
“… The Bulk-Synchronous Parallel (BSP) is a well-known computing model originally devised for distributed algorithms running on clusters of single-core processors …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:

Suchergebnisse - parallel amd distributed computing

Automated parallel execution of distributed task graphs with FPGA clusters von de Haro Ruiz, Juan Miguel, Martínez, Carlos Álvarez, Jiménez-González, Daniel, Martorell, Xavier, Ueno, Tomohiro, Sano, Kentaro, Ringlein, Burkhard, Abel, François, Weiss, Beat

PRNGine: Massively Parallel Pseudo-Random Number Generation and Probability Distribution Approximations on AMD AI Engines von Bouaziz, Mohamed, Fahmy, Suhaib A.

StreamMR: An Optimized MapReduce Framework for AMD GPUs von Elteir, M., Heshan Lin, Wu-chun Feng, Scogland, T.

Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs von Sfiligoi, Igor, Belli, Emily A, Candy, Jeff, Budiardja, Reuben D

A method for decompilation of AMD GCN kernels to OpenCL von Mihajlenko, K I, Lukin, M A, Stankevich, A S

Efficient and Distributed Computation of Electron Repulsion Integrals on AMD AI Engines von Menzel, Johannes, Plessl, Christian

Distributed computation of the critical path from execution traces von Denys, Pierre‐Frédérick, Fournier, Quentin, Dagenais, Michel R.

Performance portable Vlasov code with C++ parallel algorithm von Asahi, Yuuichi, Padioleau, Thomas, Latu, Guillaume, Bigot, Julien, Grandgirard, Virginie, Obrejan, Kevin

GPU-Accelerated Tree-Search in Chapel Versus CUDA and HIP von Helbecque, Guillaume, Krishnasamy, Ezhilmathi, Melab, Nouredine, Bouvry, Pascal

TaPaSCo-AIE: An Open-Source Framework for Streaming-Based Heterogeneous Acceleration Using AMD AI Engines von Heinz, Carsten, Kalkhof, Torben, Lavan, Yannick, Koch, Andreas

On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors von Ouro, Pablo, Lopez-Novoa, Unai, Guest, Martyn

A Performance Model for GPUs with Caches von Thanh Tuan Dao, Jungwon Kim, Sangmin Seo, Egger, Bernhard, Jaejin Lee

BCSR on GPU: A Way Forward Extreme-scale Graph Processing on Accelerator-enabled Frontier Supercomputer von Sattar, Naw Safrin, Lu, Hao, Wang, Feiyi

Dissecting the Software-Based Measurement of CPU Energy Consumption: A Comparative Analysis von Raffin, Guillaume, Trystram, Denis

Cloud Colonography: Distributed Medical Testbed over Cloud von Motai, Yuichi, Henderson, Eric, Siddique, Nahian Alam, Yoshida, Hiroyuki

Integer Sum Reduction with OpenMP on an AMD MI100 GPU von Jin, Zheming, Vetter, Jeffrey S.

A Fine-grained Prefetching Scheme for DGEMM Kernels on GPU with Auto-tuning Compatibility von Li, Jialin, Ye, Huang, Tian, Shaobo, Li, Xinyuan, Zhang, Jian

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures von Byunghyun Jang, Schaa, Dana, Mistry, Perhaad, Kaeli, David

Parallel breadth-first search on distributed memory systems von Buluç, Aydin, Madduri, Kamesh

Multi-BSP vs. BSP: A Case of Study for Dell AMD Multicores von Trabes, Guillermo, Gil-Costa, Veronica, Printista, Marcela, Marin, Mauricio

Suchwerkzeuge:

Treffer weiter einschränken

Format

Schlagwortumfeld

Thema

Sprache

Erscheinungsjahr