Search Results - Computing methodologies→Massively parallel algorithms

1

Loading…

ANT-MOC: Scalable Neutral Particle Transport Using 3D Method of Characteristics on Multi-GPU Systems by Li, Shunde, Wang, Zongguo, Bu, Lingkun, Wang, Jue, Xin, Zhikuang, Li, Shigang, Wang, Yangang, Feng, Yangde, Shi, Peng, Hu, Yun, Chi, Xuebin

ISSN: 2167-4337

Published: ACM 11.11.2023

Published in International Conference for High Performance Computing, Networking, Storage and Analysis (Online) (11.11.2023)
“…The Method Of Characteristic (MOC) to solve the Neutron Transport Equation (NTE) is the core of full-core simulation for reactors. High resolution is enabled…”

Get full text

Conference Proceeding

Save to List

Saved in:
2

Loading…

Automatic Loop Invariant Generation for Data Dependence Analysis by Tabar, Asmae Heydari, Bubel, Richard, Hahnle, Reiner

ISSN: 2575-5099

Published: ACM 01.05.2022

Published in 2022 IEEE/ACM 10th International Conference on Formal Methods in Software Engineering (FormaliSE) (01.05.2022)
“…Parallelization of programs relies on sound and precise analysis of data dependences in the code, specifically, when dealing with loops. State-of-art tools are…”

Get full text

Conference Proceeding

Save to List

Saved in:
3

Loading…

In Situ Workload Estimation for Block Assignment and Duplication in Parallelization‐Over‐Data Particle Advection by Wang, Zhe, Moreland, Kenneth, Larsen, Matthew, Kress, James, Childs, Hank, Li, Guan, Shan, Guihua, Pugmire, David

ISSN: 0167-7055, 1467-8659

Published: Oxford Blackwell Publishing Ltd 01.06.2025

Published in Computer graphics forum (01.06.2025)
“…Particle advection is a foundational algorithm for analyzing a flow field. The commonly used Parallelization‐Over‐Data (POD…”

Get full text

Journal Article

Save to List

Saved in:
4

Loading…

On‐The‐Fly Tracking of Flame Surfaces for the Visual Analysis of Combustion Processes by Oster, T., Abdelsamie, A., Motejat, M., Gerrits, T., Rössl, C., Thévenin, D., Theisel, H.

ISSN: 0167-7055, 1467-8659

Published: Oxford Blackwell Publishing Ltd 01.09.2018

Published in Computer graphics forum (01.09.2018)
“… We present an on‐the‐fly method for tracking the flame surface directly during simulation and computing the local tangential surface deformation for arbitrary time intervals…”

Get full text

Journal Article

Save to List

Saved in:
5

Loading…

A Halfedge Refinement Rule for Parallel Catmull‐Clark Subdivision by Dupuy, J., Vanhoey, K.

ISSN: 0167-7055, 1467-8659

Published: Oxford Blackwell Publishing Ltd 01.12.2021

Published in Computer graphics forum (01.12.2021)
“… We leverage these results to derive a novel parallel implementation of Catmull‐Clark subdivision suitable for the GPU…”

Get full text

Journal Article

Save to List

Saved in:
6

Loading…

Montblanc11https://github.com/ska-sa/montblanc.: GPU accelerated radio interferometer measurement equations in support of Bayesian inference for radio observations by Perkins, S.J., Marais, P.C., Zwart, J.T.L., Natarajan, I., Tasse, C., Smirnov, O.

ISSN: 2213-1337, 2213-1345

Published: Elsevier B.V 01.09.2015

Published in Astronomy and computing (01.09.2015)
“… As most of the elements of the RIME and χ2 calculation are independent of one another, they are highly amenable to parallel computation…”

Get full text

Journal Article

Save to List

Saved in:
7

Loading…

Scaling deep learning on GPU and knights landing clusters by You, Yang, Buluç, Aydın, Demmel, James

ISBN: 9781450351140, 145035114X

ISSN: 2167-4337

Published: New York, NY, USA ACM 12.11.2017

Published in International Conference for High Performance Computing, Networking, Storage and Analysis (Online) (12.11.2017)
“…) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD…”

Get full text

Conference Proceeding

Save to List

Saved in:
8

Loading…

Massively parallel 3D image reconstruction by Wang, Xiao, Sabne, Amit, Sakdhnagool, Putt, Kisner, Sherman J., Bouman, Charles A., Midkiff, Samuel P.

ISBN: 9781450351140, 145035114X

ISSN: 2167-4337

Published: New York, NY, USA ACM 12.11.2017

Published in International Conference for High Performance Computing, Networking, Storage and Analysis (Online) (12.11.2017)
“… This paper presents a new algorithm for MBIR, the Non-Uniform Parallel Super-Voxel (NU-PSV) algorithm, that regularizes the data access pattern, enables massive parallelism, and ensures fast convergence…”

Get full text

Conference Proceeding

Save to List

Saved in:
9

Loading…

Scalable reduction collectives with data partitioning-based multi-leader design by Bayatpour, Mohammadreza, Chakraborty, Sourav, Subramoni, Hari, Lu, Xiaoyi, Panda, Dhabaleswar K. (DK)

ISBN: 9781450351140, 145035114X

ISSN: 2167-4337

Published: New York, NY, USA ACM 12.11.2017

Published in International Conference for High Performance Computing, Networking, Storage and Analysis (Online) (12.11.2017)
“…Existing designs for MPI_Allreduce do not take advantage of the vast parallelism available in modern multi-/many-core processors like Intel Xeon/Xeon Phis or…”

Get full text

Conference Proceeding

Save to List

Saved in:
10

Loading…

Seed-and-Vote based In-Memory Accelerator for DNA Read Mapping by Laguna, Ann Franchesca, Gamaarachchi, Hasindu, Yin, Xunzhao, Niemier, Michael, Parameswaran, Sri, Hu, X. Sharon

ISSN: 1558-2434

Published: Association on Computer Machinery 02.11.2020

Published in Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design (02.11.2020)
“… In-memory computing can help address the memory-bandwidth bottleneck by minimizing data transfers…”

Get full text

Conference Proceeding

Save to List

Saved in:
11

Loading…

Gravel: fine-grain GPU-initiated network messages by Orr, Marc S., Che, Shuai, Beckmann, Bradford M., Oskin, Mark, Reinhardt, Steven K., Wood, David A.

ISBN: 9781450351140, 145035114X

ISSN: 2167-4337

Published: New York, NY, USA ACM 12.11.2017

Published in International Conference for High Performance Computing, Networking, Storage and Analysis (Online) (12.11.2017)
“… (implemented with CPU threads), which combines messages targeting to the same destination. Gravel leverages diverged work-group-level semantics to amortize synchronization across the GPU's data-parallel lanes…”

Get full text

Conference Proceeding

Save to List

Saved in:
12

Loading…

DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs by Babaei, Amir Fakhim, Chantem, Thidapat

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… In particular, DARIS improves GPU utilization and uniquely analyzes GPU concurrency by oversubscribing computing resources…”

Get full text

Conference Proceeding

Save to List

Saved in:
13

Loading…

Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs by Wang, Pengyu, Li, Chao, Wang, Jing, Wang, Taolei, Zhang, Lu, Leng, Jingwen, Chen, Quan, Guo, Minyi

Published: IEEE 01.09.2021

Published in 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT) (01.09.2021)
“…Graph sampling and random walk operations, capturing the structural properties of graphs, are playing an important role today as we cannot directly adopt computing-intensive algorithms on large-scale graphs…”

Get full text

Conference Proceeding

Save to List

Saved in:
14

Loading…

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems by Hsia, Samuel, Golden, Alicia, Acun, Bilge, Ardalani, Newsha, DeVito, Zachary, Wei, Gu-Yeon, Brooks, David, Wu, Carole-Jean

Published: IEEE 29.06.2024

Published in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“…Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs…”

Get full text

Conference Proceeding

Save to List

Saved in:
15

Loading…

DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication by Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power…”

Get full text

Conference Proceeding

Save to List

Saved in:
16

Loading…

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference by Zhong, Shuzhang, Sun, Yanfan, Liang, Ling, Wang, Runsheng, Huang, Ru, Li, Meng

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase…”

Get full text

Conference Proceeding

Save to List

Saved in:
17

Loading…

NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System by Jiang, Qingcai, Tu, Buxin, Hao, Xiaoyu, Chen, Junshi, An, Hong

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Linear-response time-dependent Density Functional Theory (LR-TDDFT) is a widely used method for accurately predicting the excited-state properties of physical…”

Get full text

Conference Proceeding

Save to List

Saved in:
18

Loading…

ISAVS: Interactive Scalable Analysis and Visualization System by Petruzza, Steve, Venkat, Aniketh, Gyulassy, Attila, Scorzelli, Giorgio, Pascucci, Valerio, Federer, Frederick, Angelucci, Alessandra, Bremer, Peer-Timo

Published: United States 01.11.2017

Published in SIGGRAPH Asia 2017 Symposium on Visualization. SIGGRAPH Asia Symposium on Visualization (2017 : Bangkok, Thailand) (01.11.2017)
“… Furthermore analysis on HPC systems often require complex hand-written parallel implementations of algorithms that suffer from poor portability and maintainability…”

Get more information

Journal Article

Save to List

Saved in:
19

Loading…

SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs by Zhao, Jianqi, Wen, Yao, Luo, Yuchen, Jin, Zhou, Liu, Weifeng, Zhou, Zhenya

Published: IEEE 05.12.2021

Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“…Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs…”

Get full text

Conference Proceeding

Save to List

Saved in:
20

Loading…

OpenDRC: An Efficient Open-Source Design Rule Checking Engine with Hierarchical GPU Acceleration by He, Zhuolun, Zuo, Yihang, Jiang, Jiaxi, Zheng, Haisheng, Ma, Yuzhe, Yu, Bei

Published: IEEE 09.07.2023

Published in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)
“… OpenDRC maintains hierarchical layouts with layer-wise bounding volume hierarchies and performs adaptive row-based partition to identify independent regions for check pruning and/or parallel processing…”

Get full text

Conference Proceeding

Save to List

Saved in:

Search Results - Computing methodologies→Massively parallel algorithms

ANT-MOC: Scalable Neutral Particle Transport Using 3D Method of Characteristics on Multi-GPU Systems by Li, Shunde, Wang, Zongguo, Bu, Lingkun, Wang, Jue, Xin, Zhikuang, Li, Shigang, Wang, Yangang, Feng, Yangde, Shi, Peng, Hu, Yun, Chi, Xuebin

Automatic Loop Invariant Generation for Data Dependence Analysis by Tabar, Asmae Heydari, Bubel, Richard, Hahnle, Reiner

In Situ Workload Estimation for Block Assignment and Duplication in Parallelization‐Over‐Data Particle Advection by Wang, Zhe, Moreland, Kenneth, Larsen, Matthew, Kress, James, Childs, Hank, Li, Guan, Shan, Guihua, Pugmire, David

On‐The‐Fly Tracking of Flame Surfaces for the Visual Analysis of Combustion Processes by Oster, T., Abdelsamie, A., Motejat, M., Gerrits, T., Rössl, C., Thévenin, D., Theisel, H.

A Halfedge Refinement Rule for Parallel Catmull‐Clark Subdivision by Dupuy, J., Vanhoey, K.

Montblanc11https://github.com/ska-sa/montblanc.: GPU accelerated radio interferometer measurement equations in support of Bayesian inference for radio observations by Perkins, S.J., Marais, P.C., Zwart, J.T.L., Natarajan, I., Tasse, C., Smirnov, O.

Scaling deep learning on GPU and knights landing clusters by You, Yang, Buluç, Aydın, Demmel, James

Massively parallel 3D image reconstruction by Wang, Xiao, Sabne, Amit, Sakdhnagool, Putt, Kisner, Sherman J., Bouman, Charles A., Midkiff, Samuel P.

Scalable reduction collectives with data partitioning-based multi-leader design by Bayatpour, Mohammadreza, Chakraborty, Sourav, Subramoni, Hari, Lu, Xiaoyi, Panda, Dhabaleswar K. (DK)

Seed-and-Vote based In-Memory Accelerator for DNA Read Mapping by Laguna, Ann Franchesca, Gamaarachchi, Hasindu, Yin, Xunzhao, Niemier, Michael, Parameswaran, Sri, Hu, X. Sharon

Gravel: fine-grain GPU-initiated network messages by Orr, Marc S., Che, Shuai, Beckmann, Bradford M., Oskin, Mark, Reinhardt, Steven K., Wood, David A.

DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs by Babaei, Amir Fakhim, Chantem, Thidapat

Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs by Wang, Pengyu, Li, Chao, Wang, Jing, Wang, Taolei, Zhang, Lu, Leng, Jingwen, Chen, Quan, Guo, Minyi

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems by Hsia, Samuel, Golden, Alicia, Acun, Bilge, Ardalani, Newsha, DeVito, Zachary, Wei, Gu-Yeon, Brooks, David, Wu, Carole-Jean

DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication by Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference by Zhong, Shuzhang, Sun, Yanfan, Liang, Ling, Wang, Runsheng, Huang, Ru, Li, Meng

NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System by Jiang, Qingcai, Tu, Buxin, Hao, Xiaoyu, Chen, Junshi, An, Hong

ISAVS: Interactive Scalable Analysis and Visualization System by Petruzza, Steve, Venkat, Aniketh, Gyulassy, Attila, Scorzelli, Giorgio, Pascucci, Valerio, Federer, Frederick, Angelucci, Alessandra, Bremer, Peer-Timo

SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs by Zhao, Jianqi, Wen, Yao, Luo, Yuchen, Jin, Zhou, Liu, Weifeng, Zhou, Zhenya

OpenDRC: An Efficient Open-Source Design Rule Checking Engine with Hierarchical GPU Acceleration by He, Zhuolun, Zuo, Yihang, Jiang, Jiaxi, Zheng, Haisheng, Ma, Yuzhe, Yu, Bei

Search Tools:

Refine Results

Format

Subject Area

Topic

Language

Year of Publication