Search Results - single instruction multiple data parallel computation model

Refine Results
  1. 1

    A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels by Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

    ISSN: 0266-352X, 1873-7633
    Published: New York Elsevier Ltd 01.02.2020
    Published in Computers and geotechnics (01.02.2020)
    “…In this study, combined with the zero–thickness cohesive element (ZE) model and explicit integration method, a parallelization technique based on graphics processing units (GPU…”
    Get full text
    Journal Article
  2. 2

    Splitwise: Efficient Generative LLM Inference Using Phase Splitting by Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

    Published: IEEE 29.06.2024
    “…Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs…”
    Get full text
    Conference Proceeding
  3. 3

    Max-PIM: Fast and Efficient Max/Min Searching in DRAM by Zhang, Fan, Angizi, Shaahin, Fan, Deliang

    Published: IEEE 05.12.2021
    “…Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing…”
    Get full text
    Conference Proceeding
  4. 4

    A flexible algorithm for calculating pair interactions on SIMD architectures by Páll, Szilárd, Hess, Berk

    ISSN: 0010-4655, 1879-2944, 1879-2944
    Published: Elsevier B.V 01.12.2013
    Published in Computer physics communications (01.12.2013)
    “… In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD…”
    Get full text
    Journal Article
  5. 5

    PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type by Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

    Published: IEEE 22.06.2025
    “…Large language models (LLMs) have transformed numerous AI applications, with on-device deployment becoming increasingly important for reducing cloud computing costs and protecting user privacy…”
    Get full text
    Conference Proceeding
  6. 6

    McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs by Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

    Published: IEEE 22.06.2025
    “…Large language models (LLMs) have gained significant attention recently. However, executing LLM is memory-bound due to the extensive memory accesses. Process-in-memory (PIM…”
    Get full text
    Conference Proceeding
  7. 7

    KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference by Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

    Published: IEEE 22.06.2025
    “…With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important…”
    Get full text
    Conference Proceeding
  8. 8

    A Novel Wavefront-Based High Parallel Solution for HEVC Encoding by Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

    ISSN: 1051-8215, 1558-2205
    Published: New York IEEE 01.01.2016
    “… On data level, optimal single-instruction-multiple-data algorithms are designed for the enhanced coding tools, i.e…”
    Get full text
    Journal Article
  9. 9

    MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing by Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

    ISSN: 2378-203X
    Published: IEEE 02.03.2024
    “…, 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways…”
    Get full text
    Conference Proceeding
  10. 10

    PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions by Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

    ISSN: 0278-0070, 1937-4151
    Published: New York IEEE 01.04.2023
    “… In this article, we propose PHCG, an optimized code generator for the Simulink model with single-instruction-multiple-data (SIMD…”
    Get full text
    Journal Article
  11. 11

    PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries by Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

    Published: IEEE 22.06.2025
    “…Pairwise queries have been widely used in many applications. Although several approaches have been recently proposed to accelerate a single query, they still…”
    Get full text
    Conference Proceeding
  12. 12

    Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra by Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

    Published: IEEE 22.06.2025
    “…While sparsity, a feature of data in many applications, provides optimization opportunities such as reducing unnecessary computations, data transfers, and storage, it causes several challenges, too…”
    Get full text
    Conference Proceeding
  13. 13

    PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs by Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

    Published: IEEE 22.06.2025
    “… During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT…”
    Get full text
    Conference Proceeding
  14. 14

    PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration by Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

    Published: IEEE 05.12.2021
    “…Graph Convolutional Network (GCN) is a promising but computing- and memory-intensive learning model…”
    Get full text
    Conference Proceeding
  15. 15

    A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs by Panda, B.S., Das, Sajal K.

    ISSN: 0020-0190, 1872-6119
    Published: Amsterdam Elsevier B.V 31.08.2009
    Published in Information processing letters (31.08.2009)
    “… (Single Instruction Stream Multiple Data Stream Concurrent Read Concurrent Write Parallel Random Access Machine…”
    Get full text
    Journal Article
  16. 16

    EDGE: Event-Driven GPU Execution by Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

    ISSN: 2641-7936
    Published: IEEE 01.09.2019
    “… supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction…”
    Get full text
    Conference Proceeding
  17. 17

    Occamy: Memory-efficient GPU Compiler for DNN Inference by Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

    Published: IEEE 09.07.2023
    “…This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy…”
    Get full text
    Conference Proceeding
  18. 18

    Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators by Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

    Published: IEEE 29.06.2024
    “…Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations…”
    Get full text
    Conference Proceeding
  19. 19

    Offloaded MPI message matching: an optimistic approach by Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

    Published: IEEE 17.11.2024
    “… In this work, we propose a bin-based MPI message approach, Optimistic Tag Matching, explicitly designed for the lightweight, highly parallel architectures typical of on-path SmartNICs…”
    Get full text
    Conference Proceeding
  20. 20

    Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC by Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

    ISSN: 1045-9219, 1558-2183
    Published: New York IEEE 01.04.2022
    “… models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix…”
    Get full text
    Journal Article