Suchergebnisse - single instruction multiple data parallel computation model

  1. 1

    A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels von Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

    ISSN: 0266-352X, 1873-7633
    Veröffentlicht: New York Elsevier Ltd 01.02.2020
    Veröffentlicht in Computers and geotechnics (01.02.2020)
    “… In this study, combined with the zero–thickness cohesive element (ZE) model and explicit integration method, a parallelization technique based on graphics processing units (GPU …”
    Volltext
    Journal Article
  2. 2

    Splitwise: Efficient Generative LLM Inference Using Phase Splitting von Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

    Veröffentlicht: IEEE 29.06.2024
    “… Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs …”
    Volltext
    Tagungsbericht
  3. 3

    Max-PIM: Fast and Efficient Max/Min Searching in DRAM von Zhang, Fan, Angizi, Shaahin, Fan, Deliang

    Veröffentlicht: IEEE 05.12.2021
    “… Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing …”
    Volltext
    Tagungsbericht
  4. 4

    A flexible algorithm for calculating pair interactions on SIMD architectures von Páll, Szilárd, Hess, Berk

    ISSN: 0010-4655, 1879-2944, 1879-2944
    Veröffentlicht: Elsevier B.V 01.12.2013
    Veröffentlicht in Computer physics communications (01.12.2013)
    “… In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD …”
    Volltext
    Journal Article
  5. 5

    PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type von Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

    Veröffentlicht: IEEE 22.06.2025
    “… Large language models (LLMs) have transformed numerous AI applications, with on-device deployment becoming increasingly important for reducing cloud computing costs and protecting user privacy …”
    Volltext
    Tagungsbericht
  6. 6

    McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs von Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

    Veröffentlicht: IEEE 22.06.2025
    “… Large language models (LLMs) have gained significant attention recently. However, executing LLM is memory-bound due to the extensive memory accesses. Process-in-memory (PIM …”
    Volltext
    Tagungsbericht
  7. 7

    KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference von Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

    Veröffentlicht: IEEE 22.06.2025
    “… With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important …”
    Volltext
    Tagungsbericht
  8. 8

    A Novel Wavefront-Based High Parallel Solution for HEVC Encoding von Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

    ISSN: 1051-8215, 1558-2205
    Veröffentlicht: New York IEEE 01.01.2016
    “… On data level, optimal single-instruction-multiple-data algorithms are designed for the enhanced coding tools, i.e …”
    Volltext
    Journal Article
  9. 9

    MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing von Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

    ISSN: 2378-203X
    Veröffentlicht: IEEE 02.03.2024
    “… , 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways …”
    Volltext
    Tagungsbericht
  10. 10

    PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions von Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

    ISSN: 0278-0070, 1937-4151
    Veröffentlicht: New York IEEE 01.04.2023
    “… In this article, we propose PHCG, an optimized code generator for the Simulink model with single-instruction-multiple-data (SIMD …”
    Volltext
    Journal Article
  11. 11

    PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries von Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

    Veröffentlicht: IEEE 22.06.2025
    “… Pairwise queries have been widely used in many applications. Although several approaches have been recently proposed to accelerate a single query, they still …”
    Volltext
    Tagungsbericht
  12. 12

    Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra von Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

    Veröffentlicht: IEEE 22.06.2025
    “… While sparsity, a feature of data in many applications, provides optimization opportunities such as reducing unnecessary computations, data transfers, and storage, it causes several challenges, too …”
    Volltext
    Tagungsbericht
  13. 13

    PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs von Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

    Veröffentlicht: IEEE 22.06.2025
    “… During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT …”
    Volltext
    Tagungsbericht
  14. 14

    PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration von Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

    Veröffentlicht: IEEE 05.12.2021
    “… Graph Convolutional Network (GCN) is a promising but computing- and memory-intensive learning model …”
    Volltext
    Tagungsbericht
  15. 15

    A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs von Panda, B.S., Das, Sajal K.

    ISSN: 0020-0190, 1872-6119
    Veröffentlicht: Amsterdam Elsevier B.V 31.08.2009
    Veröffentlicht in Information processing letters (31.08.2009)
    “… (Single Instruction Stream Multiple Data Stream Concurrent Read Concurrent Write Parallel Random Access Machine …”
    Volltext
    Journal Article
  16. 16

    EDGE: Event-Driven GPU Execution von Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

    ISSN: 2641-7936
    Veröffentlicht: IEEE 01.09.2019
    “… supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction …”
    Volltext
    Tagungsbericht
  17. 17

    Occamy: Memory-efficient GPU Compiler for DNN Inference von Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

    Veröffentlicht: IEEE 09.07.2023
    “… This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy …”
    Volltext
    Tagungsbericht
  18. 18

    Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators von Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

    Veröffentlicht: IEEE 29.06.2024
    “… Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations …”
    Volltext
    Tagungsbericht
  19. 19

    Offloaded MPI message matching: an optimistic approach von Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

    Veröffentlicht: IEEE 17.11.2024
    “… In this work, we propose a bin-based MPI message approach, Optimistic Tag Matching, explicitly designed for the lightweight, highly parallel architectures typical of on-path SmartNICs …”
    Volltext
    Tagungsbericht
  20. 20

    Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC von Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

    ISSN: 1045-9219, 1558-2183
    Veröffentlicht: New York IEEE 01.04.2022
    “… models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix …”
    Volltext
    Journal Article