Search Results - Distributed-memory parallelisms
-
1
3D DFT by block tensor-matrix multiplication via a modified Cannon's algorithm: Implementation and scaling on distributed-memory clusters with fat tree networks
ISSN: 0743-7315Published: Elsevier Inc 01.11.2024Published in Journal of parallel and distributed computing (01.11.2024)“…A known scalability bottleneck of the parallel 3D FFT is its use of all-to-all communications. Here, we present S3DFT, a library that circumvents this by using…”
Get full text
Journal Article -
2
A framework for exploiting task and data parallelism on distributed memory multicomputers
ISSN: 1045-9219Published: IEEE 01.11.1997Published in IEEE transactions on parallel and distributed systems (01.11.1997)“… compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism…”
Get full text
Journal Article -
3
Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis (IV) hfbtho (v4.0): A new version of the program
ISSN: 0010-4655, 1879-2944Published: United States Elsevier B.V 01.07.2022Published in Computer physics communications (01.07.2022)“…We describe the new version 4.0 of the code hfbtho that solves the nuclear Hartree-Fock-Bogoliubov problem by using the deformed harmonic oscillator basis in…”
Get full text
Journal Article -
4
superB/NRPy: scalable, task-based numerical relativity for 3G gravitational wave science
ISSN: 0264-9381, 1361-6382Published: IOP Publishing 01.08.2025Published in Classical and quantum gravity (01.08.2025)Get full text
Journal Article -
5
Iterators, Schedulers, and Distributed-memory Parallelism
ISSN: 0038-0644, 1097-024XPublished: New York John Wiley & Sons, Ltd 01.04.1996Published in Software, practice & experience (01.04.1996)“…’ for sequential and parallel query evaluation. Unfortunately, those earlier models have a severe drawback with respect to resource allocation in distributed‐memory systems…”
Get full text
Journal Article -
6
Massively parallel implementation and approaches to simulate quantum dynamics using Krylov subspace techniques
ISSN: 0010-4655, 1879-2944Published: Elsevier B.V 01.02.2019Published in Computer physics communications (01.02.2019)“…We have developed an application and implemented parallel algorithms in order to provide a computational framework suitable for massively parallel…”
Get full text
Journal Article -
7
Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow
ISSN: 1420-0597, 1573-1499Published: Cham Springer International Publishing 01.10.2021Published in Computational geosciences (01.10.2021)“…Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in…”
Get full text
Journal Article -
8
MPI+X: task-based parallelisation and dynamic load balance of finite element assembly
ISSN: 1061-8562, 1029-0257Published: Abingdon Taylor & Francis 16.03.2019Published in International journal of computational fluid dynamics (16.03.2019)“… of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop…”
Get full text
Journal Article -
9
Parallelization of a distributed ecohydrological model
ISSN: 1364-8152, 1873-6726Published: Oxford Elsevier Ltd 01.03.2018Published in Environmental modelling & software : with environment data news (01.03.2018)“… High resolution simulations at a large scale are therefore computationally expensive and cause a run-time memory burden. Using distributed (MPI) and shared (OpenMP…”
Get full text
Journal Article -
10
A scalable scheduling scheme for functional parallelism on distributed memory multiprocessor systems
ISSN: 1045-9219Published: Los Alamitos, CA IEEE 01.04.1995Published in IEEE transactions on parallel and distributed systems (01.04.1995)“… and partially at run time. Assuming infinite number of processors, the compile time schedule is found using a new concept of the threshold of a task that quantifies a trade-off between the schedule-length and the degree of parallelism…”
Get full text
Journal Article -
11
A shared compilation stack for distributed-memory parallelism in stencil DSLs
ISSN: 2331-8422Published: Ithaca Cornell University Library, arXiv.org 02.04.2024Published in arXiv.org (02.04.2024)“…Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express…”
Get full text
Paper -
12
A Robust Compile Time Method for Scheduling Task Parallelism on Distributed Memory Machines
ISSN: 0920-8542, 1573-0484Published: 01.10.1998Published in The Journal of supercomputing (01.10.1998)“…A compile time scheduling algorithm for a variable number of available processors is introduced and the impact of the change of computation and communication…”
Get full text
Journal Article -
13
On the Test Particle Monte-Carlo method to solve the steady state Boltzmann equation, the congruity of its results with experiments and its potential for shared memory parallelism
ISSN: 0021-9991, 1090-2716Published: Cambridge Elsevier Inc 01.11.2021Published in Journal of computational physics (01.11.2021)“…The Test Particle Monte Carlo is a known method to solve the steady state Boltzmann particle transport equation in rarefied gas systems. A description of the…”
Get full text
Journal Article -
14
High-Performance Sorting-Based k-mer Counting in Distributed Memory with Flexible Hybrid Parallelism
ISSN: 2331-8422Published: Ithaca Cornell University Library, arXiv.org 10.07.2024Published in arXiv.org (10.07.2024)“… Due to the growing volume of data, the scaling of the counting process is critical. In the literature, distributed memory software uses hash tables, which exhibit poor cache friendliness and consume excessive memory…”
Get full text
Paper -
15
CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism
ISSN: 2640-0316Published: IEEE 18.12.2023Published in Proceedings - International Conference on High Performance Computing (18.12.2023)“… Hybrid-parallel training approaches have emerged that combine pipelining with data and tensor parallelism to facilitate the training of large DL models on distributed hardware setups…”
Get full text
Conference Proceeding -
16
A study of shared-memory parallelism in a multifrontal solver
ISSN: 0167-8191, 1872-7336Published: Elsevier B.V 01.03.2014Published in Parallel computing (01.03.2014)“… We introduce shared-memory parallelism in a parallel distributed-memory solver, targeting multi-core architectures…”
Get full text
Journal Article -
17
A robust compile time method for scheduling task parallelism on distributed memory machines
ISBN: 9780818676338, 0818676337ISSN: 1089-795XPublished: IEEE 1996Published in Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique (1996)“…A desirable property of a compile time scheduling algorithm is robustness against the variations in the computation and communication costs so that the run…”
Get full text
Conference Proceeding -
18
Reservoir Echo State Network for Classification of Multivariate Time Series
ISSN: 2770-0135Published: IEEE 18.12.2023Published in Proceedings (IEEE International Conference on High Performance Computing Workshops) (18.12.2023)“… It leverages both CPU-shared memory and parallel distributed memory architecture to efficiently capture reservoir state's optimal model space representation, addressing computational challenges in MTS analysis…”
Get full text
Conference Proceeding -
19
Swift : a modern highly parallel gravity and smoothed particle hydrodynamics solver for astrophysical and cosmological applications
ISSN: 0035-8711, 1365-2966, 1365-2966Published: London Oxford University Press 01.05.2024Published in Monthly notices of the Royal Astronomical Society (01.05.2024)“… The software package exploits hybrid shared- and distributed-memory task-based parallelism, asynchronous communications, and domain-decomposition algorithms based on balancing the workload, rather…”
Get full text
Journal Article -
20
Automated MPI-X Code Generation for Scalable Finite-Difference Solvers
ISSN: 1530-2075Published: IEEE 03.06.2025Published in Proceedings - IEEE International Parallel and Distributed Processing Symposium (03.06.2025)“… This paper introduces automated codegeneration techniques specifically tailored for distributed memory parallelism (DMP…”
Get full text
Conference Proceeding

