Suchergebnisse - parallel amd distributed computing

  1. 1

    Automated parallel execution of distributed task graphs with FPGA clusters von de Haro Ruiz, Juan Miguel, Martínez, Carlos Álvarez, Jiménez-González, Daniel, Martorell, Xavier, Ueno, Tomohiro, Sano, Kentaro, Ringlein, Burkhard, Abel, François, Weiss, Beat

    ISSN: 0167-739X
    Veröffentlicht: Elsevier B.V 01.11.2024
    Veröffentlicht in Future generation computer systems (01.11.2024)
    “… Over the years, Field Programmable Gate Arrays (FPGA) have been gaining popularity in the High Performance Computing (HPC …”
    Volltext
    Journal Article
  2. 2

    PRNGine: Massively Parallel Pseudo-Random Number Generation and Probability Distribution Approximations on AMD AI Engines von Bouaziz, Mohamed, Fahmy, Suhaib A.

    ISSN: 2995-066X
    Veröffentlicht: IEEE 03.06.2025
    “… Generating large volumes of random numbers is essential for high-performance computing applications such as Monte Carlo simulations, machine learning, and dynamic game-play …”
    Volltext
    Tagungsbericht
  3. 3

    StreamMR: An Optimized MapReduce Framework for AMD GPUs von Elteir, M., Heshan Lin, Wu-chun Feng, Scogland, T.

    ISBN: 1457718758, 9781457718755
    ISSN: 1521-9097
    Veröffentlicht: IEEE 01.12.2011
    “… MapReduce is a programming model from Google that facilitates parallel processing on a cluster of thousands of commodity computers …”
    Volltext
    Tagungsbericht
  4. 4

    Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs von Sfiligoi, Igor, Belli, Emily A, Candy, Jeff, Budiardja, Reuben D

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 17.05.2023
    Veröffentlicht in arXiv.org (17.05.2023)
    “… Recent exascale HPC systems are, however, introducing GPUs from other vendors, e.g. with the AMD GPU-based OLCF Frontier system just becoming available …”
    Volltext
    Paper
  5. 5

    A method for decompilation of AMD GCN kernels to OpenCL von Mihajlenko, K I, Lukin, M A, Stankevich, A S

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 16.07.2021
    Veröffentlicht in arXiv.org (16.07.2021)
    “… They are available for many hardware architectures and programming languages. However, none of the existing decompilers support modern AMD GPU architectures such as AMD GCN and RDNA. Purpose …”
    Volltext
    Paper
  6. 6

    Efficient and Distributed Computation of Electron Repulsion Integrals on AMD AI Engines von Menzel, Johannes, Plessl, Christian

    ISSN: 2576-2621
    Veröffentlicht: IEEE 04.05.2025
    “… Computing electron repulsion integrals (ERIs) is the major computational bottleneck of many quantum mechanical simulation methods, requiring trillions of ERI evaluations per time step …”
    Volltext
    Tagungsbericht
  7. 7

    Distributed computation of the critical path from execution traces von Denys, Pierre‐Frédérick, Fournier, Quentin, Dagenais, Michel R.

    ISSN: 0038-0644, 1097-024X
    Veröffentlicht: Bognor Regis Wiley Subscription Services, Inc 01.08.2023
    Veröffentlicht in Software, practice & experience (01.08.2023)
    “… Due to the ever‐increasing number of computer nodes in distributed systems, efficient and effective tools have become crucial for their analysis …”
    Volltext
    Journal Article
  8. 8

    Performance portable Vlasov code with C++ parallel algorithm von Asahi, Yuuichi, Padioleau, Thomas, Latu, Guillaume, Bigot, Julien, Grandgirard, Virginie, Obrejan, Kevin

    ISSN: 2831-3909
    Veröffentlicht: IEEE 01.11.2022
    “… parallel algorithm to run across multiple CPUs and GPUs. Relying on the language standard parallelism stdpar and proposed language standard multi-dimensional array …”
    Volltext
    Tagungsbericht
  9. 9

    GPU-Accelerated Tree-Search in Chapel Versus CUDA and HIP von Helbecque, Guillaume, Krishnasamy, Ezhilmathi, Melab, Nouredine, Bouvry, Pascal

    Veröffentlicht: IEEE 27.05.2024
    “… In the context of exascale programming, the PGAS-based Chapel is among the rare languages targeting the holistic handling of high-performance computing issues including the productivity-aware …”
    Volltext
    Tagungsbericht
  10. 10

    TaPaSCo-AIE: An Open-Source Framework for Streaming-Based Heterogeneous Acceleration Using AMD AI Engines von Heinz, Carsten, Kalkhof, Torben, Lavan, Yannick, Koch, Andreas

    Veröffentlicht: IEEE 27.05.2024
    “… AMD AI Engines (AIEs) extend the design space and open up new options for coarse-grained processing in re-configurable accelerators …”
    Volltext
    Tagungsbericht
  11. 11

    On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors von Ouro, Pablo, Lopez-Novoa, Unai, Guest, Martyn

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 12.10.2020
    Veröffentlicht in arXiv.org (12.10.2020)
    “… No area of computing is hungrier for performance than High Performance Computing (HPC …”
    Volltext
    Paper
  12. 12

    A Performance Model for GPUs with Caches von Thanh Tuan Dao, Jungwon Kim, Sangmin Seo, Egger, Bernhard, Jaejin Lee

    ISSN: 1045-9219, 1558-2183
    Veröffentlicht: New York IEEE 01.07.2015
    “… To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices …”
    Volltext
    Journal Article
  13. 13

    BCSR on GPU: A Way Forward Extreme-scale Graph Processing on Accelerator-enabled Frontier Supercomputer von Sattar, Naw Safrin, Lu, Hao, Wang, Feiyi

    Veröffentlicht: IEEE 17.11.2024
    “… Handling large graphs in a distributed environment requires effective partitioning across processors and efficient management of local partitions …”
    Volltext
    Tagungsbericht
  14. 14

    Dissecting the Software-Based Measurement of CPU Energy Consumption: A Comparative Analysis von Raffin, Guillaume, Trystram, Denis

    ISSN: 1045-9219, 1558-2183
    Veröffentlicht: IEEE 01.01.2025
    “… (and more) without the need for additional hardware. Since 2017, it is available on most x86 processors, including AMD processors …”
    Volltext
    Journal Article
  15. 15

    Cloud Colonography: Distributed Medical Testbed over Cloud von Motai, Yuichi, Henderson, Eric, Siddique, Nahian Alam, Yoshida, Hiroyuki

    ISSN: 2168-7161, 2372-0018
    Veröffentlicht: Piscataway IEEE Computer Society 01.04.2020
    Veröffentlicht in IEEE transactions on cloud computing (01.04.2020)
    “… The proposed AMD has the potential to play a role of the core classifier in the cloud computing framework …”
    Volltext
    Journal Article
  16. 16

    Integer Sum Reduction with OpenMP on an AMD MI100 GPU von Jin, Zheming, Vetter, Jeffrey S.

    ISBN: 9781665497480
    Veröffentlicht: IEEE 01.05.2022
    “… Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU …”
    Volltext
    Tagungsbericht
  17. 17

    A Fine-grained Prefetching Scheme for DGEMM Kernels on GPU with Auto-tuning Compatibility von Li, Jialin, Ye, Huang, Tian, Shaobo, Li, Xinyuan, Zhang, Jian

    ISSN: 1530-2075
    Veröffentlicht: IEEE 01.05.2022
    “… General Matrix Multiplication (GEMM) is one of the fundamental kernels for scientific and high-performance computing …”
    Volltext
    Tagungsbericht
  18. 18

    Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures von Byunghyun Jang, Schaa, Dana, Mistry, Perhaad, Kaeli, David

    ISSN: 1045-9219, 1558-2183
    Veröffentlicht: New York IEEE 01.01.2011
    “… The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing …”
    Volltext
    Journal Article
  19. 19

    Parallel breadth-first search on distributed memory systems von Buluç, Aydin, Madduri, Kamesh

    ISBN: 145030771X, 9781450307710
    ISSN: 2167-4329
    Veröffentlicht: New York, NY, USA ACM 12.11.2011
    “… Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems …”
    Volltext
    Tagungsbericht
  20. 20

    Multi-BSP vs. BSP: A Case of Study for Dell AMD Multicores von Trabes, Guillermo, Gil-Costa, Veronica, Printista, Marcela, Marin, Mauricio

    ISSN: 2377-5750
    Veröffentlicht: IEEE 01.03.2018
    “… The Bulk-Synchronous Parallel (BSP) is a well-known computing model originally devised for distributed algorithms running on clusters of single-core processors …”
    Volltext
    Tagungsbericht