Výsledky vyhľadávania - Computing methodologies→Massively parallel algorithms

  1. 1

    ANT-MOC: Scalable Neutral Particle Transport Using 3D Method of Characteristics on Multi-GPU Systems Autor Li, Shunde, Wang, Zongguo, Bu, Lingkun, Wang, Jue, Xin, Zhikuang, Li, Shigang, Wang, Yangang, Feng, Yangde, Shi, Peng, Hu, Yun, Chi, Xuebin

    ISSN: 2167-4337
    Vydavateľské údaje: ACM 11.11.2023
    “…The Method Of Characteristic (MOC) to solve the Neutron Transport Equation (NTE) is the core of full-core simulation for reactors. High resolution is enabled…”
    Získať plný text
    Konferenčný príspevok..
  2. 2

    Automatic Loop Invariant Generation for Data Dependence Analysis Autor Tabar, Asmae Heydari, Bubel, Richard, Hahnle, Reiner

    ISSN: 2575-5099
    Vydavateľské údaje: ACM 01.05.2022
    “…Parallelization of programs relies on sound and precise analysis of data dependences in the code, specifically, when dealing with loops. State-of-art tools are…”
    Získať plný text
    Konferenčný príspevok..
  3. 3

    In Situ Workload Estimation for Block Assignment and Duplication in Parallelization‐Over‐Data Particle Advection Autor Wang, Zhe, Moreland, Kenneth, Larsen, Matthew, Kress, James, Childs, Hank, Li, Guan, Shan, Guihua, Pugmire, David

    ISSN: 0167-7055, 1467-8659
    Vydavateľské údaje: Oxford Blackwell Publishing Ltd 01.06.2025
    Vydané v Computer graphics forum (01.06.2025)
    “…Particle advection is a foundational algorithm for analyzing a flow field. The commonly used Parallelization‐Over‐Data (POD…”
    Získať plný text
    Journal Article
  4. 4

    On‐The‐Fly Tracking of Flame Surfaces for the Visual Analysis of Combustion Processes Autor Oster, T., Abdelsamie, A., Motejat, M., Gerrits, T., Rössl, C., Thévenin, D., Theisel, H.

    ISSN: 0167-7055, 1467-8659
    Vydavateľské údaje: Oxford Blackwell Publishing Ltd 01.09.2018
    Vydané v Computer graphics forum (01.09.2018)
    “… We present an on‐the‐fly method for tracking the flame surface directly during simulation and computing the local tangential surface deformation for arbitrary time intervals…”
    Získať plný text
    Journal Article
  5. 5

    A Halfedge Refinement Rule for Parallel Catmull‐Clark Subdivision Autor Dupuy, J., Vanhoey, K.

    ISSN: 0167-7055, 1467-8659
    Vydavateľské údaje: Oxford Blackwell Publishing Ltd 01.12.2021
    Vydané v Computer graphics forum (01.12.2021)
    “… We leverage these results to derive a novel parallel implementation of Catmull‐Clark subdivision suitable for the GPU…”
    Získať plný text
    Journal Article
  6. 6

    Montblanc11https://github.com/ska-sa/montblanc.: GPU accelerated radio interferometer measurement equations in support of Bayesian inference for radio observations Autor Perkins, S.J., Marais, P.C., Zwart, J.T.L., Natarajan, I., Tasse, C., Smirnov, O.

    ISSN: 2213-1337, 2213-1345
    Vydavateľské údaje: Elsevier B.V 01.09.2015
    Vydané v Astronomy and computing (01.09.2015)
    “… As most of the elements of the RIME and χ2 calculation are independent of one another, they are highly amenable to parallel computation…”
    Získať plný text
    Journal Article
  7. 7

    Scaling deep learning on GPU and knights landing clusters Autor You, Yang, Buluç, Aydın, Demmel, James

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Vydavateľské údaje: New York, NY, USA ACM 12.11.2017
    “…) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD…”
    Získať plný text
    Konferenčný príspevok..
  8. 8

    Massively parallel 3D image reconstruction Autor Wang, Xiao, Sabne, Amit, Sakdhnagool, Putt, Kisner, Sherman J., Bouman, Charles A., Midkiff, Samuel P.

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Vydavateľské údaje: New York, NY, USA ACM 12.11.2017
    “… This paper presents a new algorithm for MBIR, the Non-Uniform Parallel Super-Voxel (NU-PSV) algorithm, that regularizes the data access pattern, enables massive parallelism, and ensures fast convergence…”
    Získať plný text
    Konferenčný príspevok..
  9. 9

    Scalable reduction collectives with data partitioning-based multi-leader design Autor Bayatpour, Mohammadreza, Chakraborty, Sourav, Subramoni, Hari, Lu, Xiaoyi, Panda, Dhabaleswar K. (DK)

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Vydavateľské údaje: New York, NY, USA ACM 12.11.2017
    “…Existing designs for MPI_Allreduce do not take advantage of the vast parallelism available in modern multi-/many-core processors like Intel Xeon/Xeon Phis or…”
    Získať plný text
    Konferenčný príspevok..
  10. 10

    Seed-and-Vote based In-Memory Accelerator for DNA Read Mapping Autor Laguna, Ann Franchesca, Gamaarachchi, Hasindu, Yin, Xunzhao, Niemier, Michael, Parameswaran, Sri, Hu, X. Sharon

    ISSN: 1558-2434
    Vydavateľské údaje: Association on Computer Machinery 02.11.2020
    “… In-memory computing can help address the memory-bandwidth bottleneck by minimizing data transfers…”
    Získať plný text
    Konferenčný príspevok..
  11. 11

    Gravel: fine-grain GPU-initiated network messages Autor Orr, Marc S., Che, Shuai, Beckmann, Bradford M., Oskin, Mark, Reinhardt, Steven K., Wood, David A.

    ISBN: 9781450351140, 145035114X
    ISSN: 2167-4337
    Vydavateľské údaje: New York, NY, USA ACM 12.11.2017
    “… (implemented with CPU threads), which combines messages targeting to the same destination. Gravel leverages diverged work-group-level semantics to amortize synchronization across the GPU's data-parallel lanes…”
    Získať plný text
    Konferenčný príspevok..
  12. 12

    DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs Autor Babaei, Amir Fakhim, Chantem, Thidapat

    Vydavateľské údaje: IEEE 22.06.2025
    “… In particular, DARIS improves GPU utilization and uniquely analyzes GPU concurrency by oversubscribing computing resources…”
    Získať plný text
    Konferenčný príspevok..
  13. 13

    Skywalker: Efficient Alias-Method-Based Graph Sampling and Random Walk on GPUs Autor Wang, Pengyu, Li, Chao, Wang, Jing, Wang, Taolei, Zhang, Lu, Leng, Jingwen, Chen, Quan, Guo, Minyi

    Vydavateľské údaje: IEEE 01.09.2021
    “…Graph sampling and random walk operations, capturing the structural properties of graphs, are playing an important role today as we cannot directly adopt computing-intensive algorithms on large-scale graphs…”
    Získať plný text
    Konferenčný príspevok..
  14. 14

    MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems Autor Hsia, Samuel, Golden, Alicia, Acun, Bilge, Ardalani, Newsha, DeVito, Zachary, Wei, Gu-Yeon, Brooks, David, Wu, Carole-Jean

    Vydavateľské údaje: IEEE 29.06.2024
    “…Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs…”
    Získať plný text
    Konferenčný príspevok..
  15. 15

    HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Autor Zhong, Shuzhang, Sun, Yanfan, Liang, Ling, Wang, Runsheng, Huang, Ru, Li, Meng

    Vydavateľské údaje: IEEE 22.06.2025
    “…The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase…”
    Získať plný text
    Konferenčný príspevok..
  16. 16

    DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication Autor Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An

    Vydavateľské údaje: IEEE 22.06.2025
    “…Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power…”
    Získať plný text
    Konferenčný príspevok..
  17. 17

    NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System Autor Jiang, Qingcai, Tu, Buxin, Hao, Xiaoyu, Chen, Junshi, An, Hong

    Vydavateľské údaje: IEEE 22.06.2025
    “…Linear-response time-dependent Density Functional Theory (LR-TDDFT) is a widely used method for accurately predicting the excited-state properties of physical…”
    Získať plný text
    Konferenčný príspevok..
  18. 18

    ISAVS: Interactive Scalable Analysis and Visualization System Autor Petruzza, Steve, Venkat, Aniketh, Gyulassy, Attila, Scorzelli, Giorgio, Pascucci, Valerio, Federer, Frederick, Angelucci, Alessandra, Bremer, Peer-Timo

    Vydavateľské údaje: United States 01.11.2017
    “… Furthermore analysis on HPC systems often require complex hand-written parallel implementations of algorithms that suffer from poor portability and maintainability…”
    Zistit podrobnosti o prístupe
    Journal Article
  19. 19

    SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs Autor Zhao, Jianqi, Wen, Yao, Luo, Yuchen, Jin, Zhou, Liu, Weifeng, Zhou, Zhenya

    Vydavateľské údaje: IEEE 05.12.2021
    “…Sparse LU factorization is one of the key building blocks of sparse direct solvers and often dominates the computing time of circuit simulation programs…”
    Získať plný text
    Konferenčný príspevok..
  20. 20

    OpenDRC: An Efficient Open-Source Design Rule Checking Engine with Hierarchical GPU Acceleration Autor He, Zhuolun, Zuo, Yihang, Jiang, Jiaxi, Zheng, Haisheng, Ma, Yuzhe, Yu, Bei

    Vydavateľské údaje: IEEE 09.07.2023
    “… OpenDRC maintains hierarchical layouts with layer-wise bounding volume hierarchies and performs adaptive row-based partition to identify independent regions for check pruning and/or parallel processing…”
    Získať plný text
    Konferenčný príspevok..