Výsledky vyhľadávania - single instruction multiple data parallel computation model

  1. 1

    A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels Autor Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

    ISSN: 0266-352X, 1873-7633
    Vydavateľské údaje: New York Elsevier Ltd 01.02.2020
    Vydané v Computers and geotechnics (01.02.2020)
    “…In this study, combined with the zero–thickness cohesive element (ZE) model and explicit integration method, a parallelization technique based on graphics processing units (GPU…”
    Získať plný text
    Journal Article
  2. 2

    Splitwise: Efficient Generative LLM Inference Using Phase Splitting Autor Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

    Vydavateľské údaje: IEEE 29.06.2024
    “…Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs…”
    Získať plný text
    Konferenčný príspevok..
  3. 3

    Max-PIM: Fast and Efficient Max/Min Searching in DRAM Autor Zhang, Fan, Angizi, Shaahin, Fan, Deliang

    Vydavateľské údaje: IEEE 05.12.2021
    “…Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing…”
    Získať plný text
    Konferenčný príspevok..
  4. 4

    A flexible algorithm for calculating pair interactions on SIMD architectures Autor Páll, Szilárd, Hess, Berk

    ISSN: 0010-4655, 1879-2944, 1879-2944
    Vydavateľské údaje: Elsevier B.V 01.12.2013
    Vydané v Computer physics communications (01.12.2013)
    “… In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD…”
    Získať plný text
    Journal Article
  5. 5

    PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type Autor Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

    Vydavateľské údaje: IEEE 22.06.2025
    “…Large language models (LLMs) have transformed numerous AI applications, with on-device deployment becoming increasingly important for reducing cloud computing costs and protecting user privacy…”
    Získať plný text
    Konferenčný príspevok..
  6. 6

    McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs Autor Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

    Vydavateľské údaje: IEEE 22.06.2025
    “…Large language models (LLMs) have gained significant attention recently. However, executing LLM is memory-bound due to the extensive memory accesses. Process-in-memory (PIM…”
    Získať plný text
    Konferenčný príspevok..
  7. 7

    KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference Autor Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

    Vydavateľské údaje: IEEE 22.06.2025
    “…With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important…”
    Získať plný text
    Konferenčný príspevok..
  8. 8

    A Novel Wavefront-Based High Parallel Solution for HEVC Encoding Autor Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

    ISSN: 1051-8215, 1558-2205
    Vydavateľské údaje: New York IEEE 01.01.2016
    “… On data level, optimal single-instruction-multiple-data algorithms are designed for the enhanced coding tools, i.e…”
    Získať plný text
    Journal Article
  9. 9

    MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing Autor Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

    ISSN: 2378-203X
    Vydavateľské údaje: IEEE 02.03.2024
    “…, 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways…”
    Získať plný text
    Konferenčný príspevok..
  10. 10

    PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions Autor Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

    ISSN: 0278-0070, 1937-4151
    Vydavateľské údaje: New York IEEE 01.04.2023
    “… In this article, we propose PHCG, an optimized code generator for the Simulink model with single-instruction-multiple-data (SIMD…”
    Získať plný text
    Journal Article
  11. 11

    PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries Autor Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

    Vydavateľské údaje: IEEE 22.06.2025
    “…Pairwise queries have been widely used in many applications. Although several approaches have been recently proposed to accelerate a single query, they still…”
    Získať plný text
    Konferenčný príspevok..
  12. 12

    Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra Autor Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

    Vydavateľské údaje: IEEE 22.06.2025
    “…While sparsity, a feature of data in many applications, provides optimization opportunities such as reducing unnecessary computations, data transfers, and storage, it causes several challenges, too…”
    Získať plný text
    Konferenčný príspevok..
  13. 13

    PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs Autor Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

    Vydavateľské údaje: IEEE 22.06.2025
    “… During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT…”
    Získať plný text
    Konferenčný príspevok..
  14. 14

    PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration Autor Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

    Vydavateľské údaje: IEEE 05.12.2021
    “…Graph Convolutional Network (GCN) is a promising but computing- and memory-intensive learning model…”
    Získať plný text
    Konferenčný príspevok..
  15. 15

    A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs Autor Panda, B.S., Das, Sajal K.

    ISSN: 0020-0190, 1872-6119
    Vydavateľské údaje: Amsterdam Elsevier B.V 31.08.2009
    Vydané v Information processing letters (31.08.2009)
    “… (Single Instruction Stream Multiple Data Stream Concurrent Read Concurrent Write Parallel Random Access Machine…”
    Získať plný text
    Journal Article
  16. 16

    EDGE: Event-Driven GPU Execution Autor Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

    ISSN: 2641-7936
    Vydavateľské údaje: IEEE 01.09.2019
    “… supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction…”
    Získať plný text
    Konferenčný príspevok..
  17. 17

    Occamy: Memory-efficient GPU Compiler for DNN Inference Autor Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

    Vydavateľské údaje: IEEE 09.07.2023
    “…This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy…”
    Získať plný text
    Konferenčný príspevok..
  18. 18

    Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators Autor Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

    Vydavateľské údaje: IEEE 29.06.2024
    “…Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations…”
    Získať plný text
    Konferenčný príspevok..
  19. 19

    Offloaded MPI message matching: an optimistic approach Autor Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

    Vydavateľské údaje: IEEE 17.11.2024
    “… In this work, we propose a bin-based MPI message approach, Optimistic Tag Matching, explicitly designed for the lightweight, highly parallel architectures typical of on-path SmartNICs…”
    Získať plný text
    Konferenčný príspevok..
  20. 20

    Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC Autor Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

    ISSN: 1045-9219, 1558-2183
    Vydavateľské údaje: New York IEEE 01.04.2022
    “… models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix…”
    Získať plný text
    Journal Article