Search Results - Computing methodologies→Massively parallel algorithms

Refine Results
  1. 1

    ANT-MOC: Scalable Neutral Particle Transport Using 3D Method of Characteristics on Multi-GPU Systems by Li, Shunde, Wang, Zongguo, Bu, Lingkun, Wang, Jue, Xin, Zhikuang, Li, Shigang, Wang, Yangang, Feng, Yangde, Shi, Peng, Hu, Yun, Chi, Xuebin

    ISSN: 2167-4337
    Published: ACM 11.11.2023
    “…The Method Of Characteristic (MOC) to solve the Neutron Transport Equation (NTE) is the core of full-core simulation for reactors. High resolution is enabled…”
    Get full text
    Conference Proceeding
  2. 2

    Automatic Loop Invariant Generation for Data Dependence Analysis by Tabar, Asmae Heydari, Bubel, Richard, Hahnle, Reiner

    ISSN: 2575-5099
    Published: ACM 01.05.2022
    “…Parallelization of programs relies on sound and precise analysis of data dependences in the code, specifically, when dealing with loops. State-of-art tools are…”
    Get full text
    Conference Proceeding
  3. 3

    In Situ Workload Estimation for Block Assignment and Duplication in Parallelization‐Over‐Data Particle Advection by Wang, Zhe, Moreland, Kenneth, Larsen, Matthew, Kress, James, Childs, Hank, Li, Guan, Shan, Guihua, Pugmire, David

    ISSN: 0167-7055, 1467-8659
    Published: Oxford Blackwell Publishing Ltd 01.06.2025
    Published in Computer graphics forum (01.06.2025)
    “…Particle advection is a foundational algorithm for analyzing a flow field. The commonly used Parallelization‐Over‐Data (POD…”
    Get full text
    Journal Article
  4. 4

    On‐The‐Fly Tracking of Flame Surfaces for the Visual Analysis of Combustion Processes by Oster, T., Abdelsamie, A., Motejat, M., Gerrits, T., Rössl, C., Thévenin, D., Theisel, H.

    ISSN: 0167-7055, 1467-8659
    Published: Oxford Blackwell Publishing Ltd 01.09.2018
    Published in Computer graphics forum (01.09.2018)
    “… We present an on‐the‐fly method for tracking the flame surface directly during simulation and computing the local tangential surface deformation for arbitrary time intervals…”
    Get full text
    Journal Article
  5. 5

    A Halfedge Refinement Rule for Parallel Catmull‐Clark Subdivision by Dupuy, J., Vanhoey, K.

    ISSN: 0167-7055, 1467-8659
    Published: Oxford Blackwell Publishing Ltd 01.12.2021
    Published in Computer graphics forum (01.12.2021)
    “… We leverage these results to derive a novel parallel implementation of Catmull‐Clark subdivision suitable for the GPU…”
    Get full text
    Journal Article
  6. 6

    Montblanc11https://github.com/ska-sa/montblanc.: GPU accelerated radio interferometer measurement equations in support of Bayesian inference for radio observations by Perkins, S.J., Marais, P.C., Zwart, J.T.L., Natarajan, I., Tasse, C., Smirnov, O.

    ISSN: 2213-1337, 2213-1345
    Published: Elsevier B.V 01.09.2015
    Published in Astronomy and computing (01.09.2015)
    “… As most of the elements of the RIME and χ2 calculation are independent of one another, they are highly amenable to parallel computation…”
    Get full text
    Journal Article
  7. 7

    Scaling deep learning on GPU and knights landing clusters by You, Yang, Buluç, Aydın, Demmel, James

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Published: New York, NY, USA ACM 12.11.2017
    “…) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD…”
    Get full text
    Conference Proceeding
  8. 8

    Massively parallel 3D image reconstruction by Wang, Xiao, Sabne, Amit, Sakdhnagool, Putt, Kisner, Sherman J., Bouman, Charles A., Midkiff, Samuel P.

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Published: New York, NY, USA ACM 12.11.2017
    “… This paper presents a new algorithm for MBIR, the Non-Uniform Parallel Super-Voxel (NU-PSV) algorithm, that regularizes the data access pattern, enables massive parallelism, and ensures fast convergence…”
    Get full text
    Conference Proceeding
  9. 9

    Scalable reduction collectives with data partitioning-based multi-leader design by Bayatpour, Mohammadreza, Chakraborty, Sourav, Subramoni, Hari, Lu, Xiaoyi, Panda, Dhabaleswar K. (DK)

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Published: New York, NY, USA ACM 12.11.2017
    “…Existing designs for MPI_Allreduce do not take advantage of the vast parallelism available in modern multi-/many-core processors like Intel Xeon/Xeon Phis or…”
    Get full text
    Conference Proceeding
  10. 10

    Seed-and-Vote based In-Memory Accelerator for DNA Read Mapping by Laguna, Ann Franchesca, Gamaarachchi, Hasindu, Yin, Xunzhao, Niemier, Michael, Parameswaran, Sri, Hu, X. Sharon

    ISSN: 1558-2434
    Published: Association on Computer Machinery 02.11.2020
    “… In-memory computing can help address the memory-bandwidth bottleneck by minimizing data transfers…”
    Get full text
    Conference Proceeding
  11. 11

    Gravel: fine-grain GPU-initiated network messages by Orr, Marc S., Che, Shuai, Beckmann, Bradford M., Oskin, Mark, Reinhardt, Steven K., Wood, David A.

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Published: New York, NY, USA ACM 12.11.2017
    “… (implemented with CPU threads), which combines messages targeting to the same destination. Gravel leverages diverged work-group-level semantics to amortize synchronization across the GPU's data-parallel lanes…”
    Get full text
    Conference Proceeding
  12. 12

    DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs by Babaei, Amir Fakhim, Chantem, Thidapat

    Published: IEEE 22.06.2025
    “… In particular, DARIS improves GPU utilization and uniquely analyzes GPU concurrency by oversubscribing computing resources…”
    Get full text
    Conference Proceeding
  13. 13

    Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs by Wang, Pengyu, Li, Chao, Wang, Jing, Wang, Taolei, Zhang, Lu, Leng, Jingwen, Chen, Quan, Guo, Minyi

    Published: IEEE 01.09.2021
    “…Graph sampling and random walk operations, capturing the structural properties of graphs, are playing an important role today as we cannot directly adopt computing-intensive algorithms on large-scale graphs…”
    Get full text
    Conference Proceeding
  14. 14

    MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems by Hsia, Samuel, Golden, Alicia, Acun, Bilge, Ardalani, Newsha, DeVito, Zachary, Wei, Gu-Yeon, Brooks, David, Wu, Carole-Jean

    Published: IEEE 29.06.2024
    “…Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs…”
    Get full text
    Conference Proceeding
  15. 15

    DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication by Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An

    Published: IEEE 22.06.2025
    “…Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power…”
    Get full text
    Conference Proceeding
  16. 16

    HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference by Zhong, Shuzhang, Sun, Yanfan, Liang, Ling, Wang, Runsheng, Huang, Ru, Li, Meng

    Published: IEEE 22.06.2025
    “…The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase…”
    Get full text
    Conference Proceeding
  17. 17

    NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System by Jiang, Qingcai, Tu, Buxin, Hao, Xiaoyu, Chen, Junshi, An, Hong

    Published: IEEE 22.06.2025
    “…Linear-response time-dependent Density Functional Theory (LR-TDDFT) is a widely used method for accurately predicting the excited-state properties of physical…”
    Get full text
    Conference Proceeding
  18. 18

    ISAVS: Interactive Scalable Analysis and Visualization System by Petruzza, Steve, Venkat, Aniketh, Gyulassy, Attila, Scorzelli, Giorgio, Pascucci, Valerio, Federer, Frederick, Angelucci, Alessandra, Bremer, Peer-Timo

    Published: United States 01.11.2017
    “… Furthermore analysis on HPC systems often require complex hand-written parallel implementations of algorithms that suffer from poor portability and maintainability…”
    Get more information
    Journal Article
  19. 19

    SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs by Zhao, Jianqi, Wen, Yao, Luo, Yuchen, Jin, Zhou, Liu, Weifeng, Zhou, Zhenya

    Published: IEEE 05.12.2021
    “…Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs…”
    Get full text
    Conference Proceeding
  20. 20

    OpenDRC: An Efficient Open-Source Design Rule Checking Engine with Hierarchical GPU Acceleration by He, Zhuolun, Zuo, Yihang, Jiang, Jiaxi, Zheng, Haisheng, Ma, Yuzhe, Yu, Bei

    Published: IEEE 09.07.2023
    “… OpenDRC maintains hierarchical layouts with layer-wise bounding volume hierarchies and performs adaptive row-based partition to identify independent regions for check pruning and/or parallel processing…”
    Get full text
    Conference Proceeding