Search Results - Distributed-memory parallelisms

Refine Results
  1. 1

    3D DFT by block tensor-matrix multiplication via a modified Cannon's algorithm: Implementation and scaling on distributed-memory clusters with fat tree networks by Malapally, Nitin, Bolnykh, Viacheslav, Suarez, Estela, Carloni, Paolo, Lippert, Thomas, Mandelli, Davide

    ISSN: 0743-7315
    Published: Elsevier Inc 01.11.2024
    “…A known scalability bottleneck of the parallel 3D FFT is its use of all-to-all communications. Here, we present S3DFT, a library that circumvents this by using…”
    Get full text
    Journal Article
  2. 2

    A framework for exploiting task and data parallelism on distributed memory multicomputers by Ramaswamy, S., Sapatnekar, S., Banerjee, P.

    ISSN: 1045-9219
    Published: IEEE 01.11.1997
    “… compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism…”
    Get full text
    Journal Article
  3. 3

    Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis (IV) hfbtho (v4.0): A new version of the program by Marević, P., Schunck, N., Ney, E.M., Navarro Pérez, R., Verriere, M., O'Neal, J.

    ISSN: 0010-4655, 1879-2944
    Published: United States Elsevier B.V 01.07.2022
    Published in Computer physics communications (01.07.2022)
    “…We describe the new version 4.0 of the code hfbtho that solves the nuclear Hartree-Fock-Bogoliubov problem by using the deformed harmonic oscillator basis in…”
    Get full text
    Journal Article
  4. 4
  5. 5

    Iterators, Schedulers, and Distributed-memory Parallelism by GRAEFE, GOETZ

    ISSN: 0038-0644, 1097-024X
    Published: New York John Wiley & Sons, Ltd 01.04.1996
    Published in Software, practice & experience (01.04.1996)
    “…’ for sequential and parallel query evaluation. Unfortunately, those earlier models have a severe drawback with respect to resource allocation in distributedmemory systems…”
    Get full text
    Journal Article
  6. 6

    Massively parallel implementation and approaches to simulate quantum dynamics using Krylov subspace techniques by Brenes, Marlon, Varma, Vipin Kerala, Scardicchio, Antonello, Girotto, Ivan

    ISSN: 0010-4655, 1879-2944
    Published: Elsevier B.V 01.02.2019
    Published in Computer physics communications (01.02.2019)
    “…We have developed an application and implemented parallel algorithms in order to provide a computational framework suitable for massively parallel…”
    Get full text
    Journal Article
  7. 7

    Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow by Hokkanen, Jaro, Kollet, Stefan, Kraus, Jiri, Herten, Andreas, Hrywniak, Markus, Pleiter, Dirk

    ISSN: 1420-0597, 1573-1499
    Published: Cham Springer International Publishing 01.10.2021
    Published in Computational geosciences (01.10.2021)
    “…Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in…”
    Get full text
    Journal Article
  8. 8

    MPI+X: task-based parallelisation and dynamic load balance of finite element assembly by Garcia-Gasulla, Marta, Houzeaux, Guillaume, Ferrer, Roger, Artigues, Antoni, López, Victor, Labarta, Jesús, Vázquez, Mariano

    ISSN: 1061-8562, 1029-0257
    Published: Abingdon Taylor & Francis 16.03.2019
    “… of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop…”
    Get full text
    Journal Article
  9. 9

    Parallelization of a distributed ecohydrological model by Liu, Ning, Shaikh, Mohsin Ahmed, Kala, Jatin, Harper, Richard J., Dell, Bernard, Liu, Shirong, Sun, Ge

    ISSN: 1364-8152, 1873-6726
    Published: Oxford Elsevier Ltd 01.03.2018
    “… High resolution simulations at a large scale are therefore computationally expensive and cause a run-time memory burden. Using distributed (MPI) and shared (OpenMP…”
    Get full text
    Journal Article
  10. 10

    A scalable scheduling scheme for functional parallelism on distributed memory multiprocessor systems by Pande, S., Agrawal, D.P., Mauney, J.

    ISSN: 1045-9219
    Published: Los Alamitos, CA IEEE 01.04.1995
    “… and partially at run time. Assuming infinite number of processors, the compile time schedule is found using a new concept of the threshold of a task that quantifies a trade-off between the schedule-length and the degree of parallelism…”
    Get full text
    Journal Article
  11. 11

    A shared compilation stack for distributed-memory parallelism in stencil DSLs by Bisbas, George, Lydike, Anton, Bauer, Emilien, Brown, Nick, Fehr, Mathieu, Mitchell, Lawrence, Rodriguez-Canal, Gabriel, Jamieson, Maurice, Kelly, Paul H J, Steuwer, Michel, Grosser, Tobias

    ISSN: 2331-8422
    Published: Ithaca Cornell University Library, arXiv.org 02.04.2024
    Published in arXiv.org (02.04.2024)
    “…Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express…”
    Get full text
    Paper
  12. 12

    A Robust Compile Time Method for Scheduling Task Parallelism on Distributed Memory Machines by Darbha, Sekhar, Pande, Santosh

    ISSN: 0920-8542, 1573-0484
    Published: 01.10.1998
    Published in The Journal of supercomputing (01.10.1998)
    “…A compile time scheduling algorithm for a variable number of available processors is introduced and the impact of the change of computation and communication…”
    Get full text
    Journal Article
  13. 13

    On the Test Particle Monte-Carlo method to solve the steady state Boltzmann equation, the congruity of its results with experiments and its potential for shared memory parallelism by Rondeau, Maxime, Arès, R.

    ISSN: 0021-9991, 1090-2716
    Published: Cambridge Elsevier Inc 01.11.2021
    Published in Journal of computational physics (01.11.2021)
    “…The Test Particle Monte Carlo is a known method to solve the steady state Boltzmann particle transport equation in rarefied gas systems. A description of the…”
    Get full text
    Journal Article
  14. 14

    High-Performance Sorting-Based k-mer Counting in Distributed Memory with Flexible Hybrid Parallelism by Li, Yifan, Guidi, Giulia

    ISSN: 2331-8422
    Published: Ithaca Cornell University Library, arXiv.org 10.07.2024
    Published in arXiv.org (10.07.2024)
    “… Due to the growing volume of data, the scaling of the counting process is critical. In the literature, distributed memory software uses hash tables, which exhibit poor cache friendliness and consume excessive memory…”
    Get full text
    Paper
  15. 15

    CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism by Dreuning, Henk, Verstoep, Kees, Bal, Henri E., van Nieuwpoort, Rob V.

    ISSN: 2640-0316
    Published: IEEE 18.12.2023
    “… Hybrid-parallel training approaches have emerged that combine pipelining with data and tensor parallelism to facilitate the training of large DL models on distributed hardware setups…”
    Get full text
    Conference Proceeding
  16. 16

    A study of shared-memory parallelism in a multifrontal solver by L’Excellent, Jean-Yves, Sid-Lakhdar, Wissam M.

    ISSN: 0167-8191, 1872-7336
    Published: Elsevier B.V 01.03.2014
    Published in Parallel computing (01.03.2014)
    “… We introduce shared-memory parallelism in a parallel distributed-memory solver, targeting multi-core architectures…”
    Get full text
    Journal Article
  17. 17

    A robust compile time method for scheduling task parallelism on distributed memory machines by Darbha, S., Pande, S.

    ISBN: 9780818676338, 0818676337
    ISSN: 1089-795X
    Published: IEEE 1996
    “…A desirable property of a compile time scheduling algorithm is robustness against the variations in the computation and communication costs so that the run…”
    Get full text
    Conference Proceeding
  18. 18

    Reservoir Echo State Network for Classification of Multivariate Time Series by Purkayastha, Basab Bijoy, Barma, Shovan

    ISSN: 2770-0135
    Published: IEEE 18.12.2023
    “… It leverages both CPU-shared memory and parallel distributed memory architecture to efficiently capture reservoir state's optimal model space representation, addressing computational challenges in MTS analysis…”
    Get full text
    Conference Proceeding
  19. 19
  20. 20

    Automated MPI-X Code Generation for Scalable Finite-Difference Solvers by Bisbas, George, Nelson, Rhodri, Louboutin, Mathias, Luporini, Fabio, Kelly, Paul H.J., Gorman, Gerard

    ISSN: 1530-2075
    Published: IEEE 03.06.2025
    “… This paper introduces automated codegeneration techniques specifically tailored for distributed memory parallelism (DMP…”
    Get full text
    Conference Proceeding