Suchergebnisse - distributed-memory parallelism

  1. 1

    3D DFT by block tensor-matrix multiplication via a modified Cannon's algorithm: Implementation and scaling on distributed-memory clusters with fat tree networks von Malapally, Nitin, Bolnykh, Viacheslav, Suarez, Estela, Carloni, Paolo, Lippert, Thomas, Mandelli, Davide

    ISSN: 0743-7315
    Veröffentlicht: Elsevier Inc 01.11.2024
    Veröffentlicht in Journal of parallel and distributed computing (01.11.2024)
    “… A known scalability bottleneck of the parallel 3D FFT is its use of all-to-all communications. Here, we present S3DFT, a library that circumvents this by using …”
    Volltext
    Journal Article
  2. 2

    A framework for exploiting task and data parallelism on distributed memory multicomputers von Ramaswamy, S., Sapatnekar, S., Banerjee, P.

    ISSN: 1045-9219
    Veröffentlicht: IEEE 01.11.1997
    “… compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism …”
    Volltext
    Journal Article
  3. 3

    Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis (IV) hfbtho (v4.0): A new version of the program von Marević, P., Schunck, N., Ney, E.M., Navarro Pérez, R., Verriere, M., O'Neal, J.

    ISSN: 0010-4655, 1879-2944
    Veröffentlicht: United States Elsevier B.V 01.07.2022
    Veröffentlicht in Computer physics communications (01.07.2022)
    “… We describe the new version 4.0 of the code hfbtho that solves the nuclear Hartree-Fock-Bogoliubov problem by using the deformed harmonic oscillator basis in …”
    Volltext
    Journal Article
  4. 4
  5. 5

    Iterators, Schedulers, and Distributed-memory Parallelism von GRAEFE, GOETZ

    ISSN: 0038-0644, 1097-024X
    Veröffentlicht: New York John Wiley & Sons, Ltd 01.04.1996
    Veröffentlicht in Software, practice & experience (01.04.1996)
    “… ’ for sequential and parallel query evaluation. Unfortunately, those earlier models have a severe drawback with respect to resource allocation in distributedmemory systems …”
    Volltext
    Journal Article
  6. 6

    Massively parallel implementation and approaches to simulate quantum dynamics using Krylov subspace techniques von Brenes, Marlon, Varma, Vipin Kerala, Scardicchio, Antonello, Girotto, Ivan

    ISSN: 0010-4655, 1879-2944
    Veröffentlicht: Elsevier B.V 01.02.2019
    Veröffentlicht in Computer physics communications (01.02.2019)
    “… We have developed an application and implemented parallel algorithms in order to provide a computational framework suitable for massively parallel …”
    Volltext
    Journal Article
  7. 7

    Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow von Hokkanen, Jaro, Kollet, Stefan, Kraus, Jiri, Herten, Andreas, Hrywniak, Markus, Pleiter, Dirk

    ISSN: 1420-0597, 1573-1499
    Veröffentlicht: Cham Springer International Publishing 01.10.2021
    Veröffentlicht in Computational geosciences (01.10.2021)
    “… Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in …”
    Volltext
    Journal Article
  8. 8

    MPI+X: task-based parallelisation and dynamic load balance of finite element assembly von Garcia-Gasulla, Marta, Houzeaux, Guillaume, Ferrer, Roger, Artigues, Antoni, López, Victor, Labarta, Jesús, Vázquez, Mariano

    ISSN: 1061-8562, 1029-0257
    Veröffentlicht: Abingdon Taylor & Francis 16.03.2019
    “… of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop …”
    Volltext
    Journal Article
  9. 9

    Parallelization of a distributed ecohydrological model von Liu, Ning, Shaikh, Mohsin Ahmed, Kala, Jatin, Harper, Richard J., Dell, Bernard, Liu, Shirong, Sun, Ge

    ISSN: 1364-8152, 1873-6726
    Veröffentlicht: Oxford Elsevier Ltd 01.03.2018
    “… High resolution simulations at a large scale are therefore computationally expensive and cause a run-time memory burden. Using distributed (MPI) and shared (OpenMP …”
    Volltext
    Journal Article
  10. 10

    A scalable scheduling scheme for functional parallelism on distributed memory multiprocessor systems von Pande, S., Agrawal, D.P., Mauney, J.

    ISSN: 1045-9219
    Veröffentlicht: Los Alamitos, CA IEEE 01.04.1995
    “… and partially at run time. Assuming infinite number of processors, the compile time schedule is found using a new concept of the threshold of a task that quantifies a trade-off between the schedule-length and the degree of parallelism …”
    Volltext
    Journal Article
  11. 11

    A shared compilation stack for distributed-memory parallelism in stencil DSLs von Bisbas, George, Lydike, Anton, Bauer, Emilien, Brown, Nick, Fehr, Mathieu, Mitchell, Lawrence, Rodriguez-Canal, Gabriel, Jamieson, Maurice, Kelly, Paul H J, Steuwer, Michel, Grosser, Tobias

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 02.04.2024
    Veröffentlicht in arXiv.org (02.04.2024)
    “… Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express …”
    Volltext
    Paper
  12. 12

    A Robust Compile Time Method for Scheduling Task Parallelism on Distributed Memory Machines von Darbha, Sekhar, Pande, Santosh

    ISSN: 0920-8542, 1573-0484
    Veröffentlicht: 01.10.1998
    Veröffentlicht in The Journal of supercomputing (01.10.1998)
    “… A compile time scheduling algorithm for a variable number of available processors is introduced and the impact of the change of computation and communication …”
    Volltext
    Journal Article
  13. 13

    On the Test Particle Monte-Carlo method to solve the steady state Boltzmann equation, the congruity of its results with experiments and its potential for shared memory parallelism von Rondeau, Maxime, Arès, R.

    ISSN: 0021-9991, 1090-2716
    Veröffentlicht: Cambridge Elsevier Inc 01.11.2021
    Veröffentlicht in Journal of computational physics (01.11.2021)
    “… The Test Particle Monte Carlo is a known method to solve the steady state Boltzmann particle transport equation in rarefied gas systems. A description of the …”
    Volltext
    Journal Article
  14. 14

    High-Performance Sorting-Based k-mer Counting in Distributed Memory with Flexible Hybrid Parallelism von Li, Yifan, Guidi, Giulia

    ISSN: 2331-8422
    Veröffentlicht: Ithaca Cornell University Library, arXiv.org 10.07.2024
    Veröffentlicht in arXiv.org (10.07.2024)
    “… Due to the growing volume of data, the scaling of the counting process is critical. In the literature, distributed memory software uses hash tables, which exhibit poor cache friendliness and consume excessive memory …”
    Volltext
    Paper
  15. 15

    CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism von Dreuning, Henk, Verstoep, Kees, Bal, Henri E., van Nieuwpoort, Rob V.

    ISSN: 2640-0316
    Veröffentlicht: IEEE 18.12.2023
    “… Hybrid-parallel training approaches have emerged that combine pipelining with data and tensor parallelism to facilitate the training of large DL models on distributed hardware setups …”
    Volltext
    Tagungsbericht
  16. 16

    A study of shared-memory parallelism in a multifrontal solver von L’Excellent, Jean-Yves, Sid-Lakhdar, Wissam M.

    ISSN: 0167-8191, 1872-7336
    Veröffentlicht: Elsevier B.V 01.03.2014
    Veröffentlicht in Parallel computing (01.03.2014)
    “… We introduce shared-memory parallelism in a parallel distributed-memory solver, targeting multi-core architectures …”
    Volltext
    Journal Article
  17. 17

    A robust compile time method for scheduling task parallelism on distributed memory machines von Darbha, S., Pande, S.

    ISBN: 9780818676338, 0818676337
    ISSN: 1089-795X
    Veröffentlicht: IEEE 1996
    “… A desirable property of a compile time scheduling algorithm is robustness against the variations in the computation and communication costs so that the run …”
    Volltext
    Tagungsbericht
  18. 18

    Reservoir Echo State Network for Classification of Multivariate Time Series von Purkayastha, Basab Bijoy, Barma, Shovan

    ISSN: 2770-0135
    Veröffentlicht: IEEE 18.12.2023
    “… It leverages both CPU-shared memory and parallel distributed memory architecture to efficiently capture reservoir state's optimal model space representation, addressing computational challenges in MTS analysis …”
    Volltext
    Tagungsbericht
  19. 19

    Automated MPI-X Code Generation for Scalable Finite-Difference Solvers von Bisbas, George, Nelson, Rhodri, Louboutin, Mathias, Luporini, Fabio, Kelly, Paul H.J., Gorman, Gerard

    ISSN: 1530-2075
    Veröffentlicht: IEEE 03.06.2025
    “… This paper introduces automated codegeneration techniques specifically tailored for distributed memory parallelism (DMP …”
    Volltext
    Tagungsbericht
  20. 20

    Scalable Adaptive PDE Solvers in Arbitrary Domains von Kumar, Saurabh, Ishii, Masado, Fernando, Milinda, Gao, Boshun, Tan, Kendrick, Hsu, Ming-Chen, Krishnamurthy, Adarsh, Sundar, Hari, Ganapathysubramanian, Baskar

    ISSN: 2167-4337
    Veröffentlicht: ACM 14.11.2021
    “… Efficiently and accurately simulating partial differential equations (PDEs) in and around arbitrarily defined geometries, especially with high levels of …”
    Volltext
    Tagungsbericht