Výsledky vyhledávání - single instruction multiple data parallel computation model

  1. 1

    A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels Autor Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

    ISSN: 0266-352X, 1873-7633
    Vydáno: New York Elsevier Ltd 01.02.2020
    Vydáno v Computers and geotechnics (01.02.2020)
    “…In this study, combined with the zero–thickness cohesive element (ZE) model and explicit integration method, a parallelization technique based on graphics processing units (GPU…”
    Získat plný text
    Journal Article
  2. 2

    Splitwise: Efficient Generative LLM Inference Using Phase Splitting Autor Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

    Vydáno: IEEE 29.06.2024
    “…Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs…”
    Získat plný text
    Konferenční příspěvek
  3. 3

    Max-PIM: Fast and Efficient Max/Min Searching in DRAM Autor Zhang, Fan, Angizi, Shaahin, Fan, Deliang

    Vydáno: IEEE 05.12.2021
    “…Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing…”
    Získat plný text
    Konferenční příspěvek
  4. 4

    A flexible algorithm for calculating pair interactions on SIMD architectures Autor Páll, Szilárd, Hess, Berk

    ISSN: 0010-4655, 1879-2944, 1879-2944
    Vydáno: Elsevier B.V 01.12.2013
    Vydáno v Computer physics communications (01.12.2013)
    “… In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD…”
    Získat plný text
    Journal Article
  5. 5

    PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type Autor Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

    Vydáno: IEEE 22.06.2025
    “…Large language models (LLMs) have transformed numerous AI applications, with on-device deployment becoming increasingly important for reducing cloud computing costs and protecting user privacy…”
    Získat plný text
    Konferenční příspěvek
  6. 6

    McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs Autor Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

    Vydáno: IEEE 22.06.2025
    “…Large language models (LLMs) have gained significant attention recently. However, executing LLM is memory-bound due to the extensive memory accesses. Process-in-memory (PIM…”
    Získat plný text
    Konferenční příspěvek
  7. 7

    KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference Autor Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

    Vydáno: IEEE 22.06.2025
    “…With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important…”
    Získat plný text
    Konferenční příspěvek
  8. 8

    A Novel Wavefront-Based High Parallel Solution for HEVC Encoding Autor Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

    ISSN: 1051-8215, 1558-2205
    Vydáno: New York IEEE 01.01.2016
    “… On data level, optimal single-instruction-multiple-data algorithms are designed for the enhanced coding tools, i.e…”
    Získat plný text
    Journal Article
  9. 9

    MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing Autor Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

    ISSN: 2378-203X
    Vydáno: IEEE 02.03.2024
    “…, 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways…”
    Získat plný text
    Konferenční příspěvek
  10. 10

    PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions Autor Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

    ISSN: 0278-0070, 1937-4151
    Vydáno: New York IEEE 01.04.2023
    “… In this article, we propose PHCG, an optimized code generator for the Simulink model with single-instruction-multiple-data (SIMD…”
    Získat plný text
    Journal Article
  11. 11

    PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries Autor Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

    Vydáno: IEEE 22.06.2025
    “…Pairwise queries have been widely used in many applications. Although several approaches have been recently proposed to accelerate a single query, they still…”
    Získat plný text
    Konferenční příspěvek
  12. 12

    Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra Autor Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

    Vydáno: IEEE 22.06.2025
    “…While sparsity, a feature of data in many applications, provides optimization opportunities such as reducing unnecessary computations, data transfers, and storage, it causes several challenges, too…”
    Získat plný text
    Konferenční příspěvek
  13. 13

    PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs Autor Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

    Vydáno: IEEE 22.06.2025
    “… During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT…”
    Získat plný text
    Konferenční příspěvek
  14. 14

    PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration Autor Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

    Vydáno: IEEE 05.12.2021
    “…Graph Convolutional Network (GCN) is a promising but computing- and memory-intensive learning model…”
    Získat plný text
    Konferenční příspěvek
  15. 15

    A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs Autor Panda, B.S., Das, Sajal K.

    ISSN: 0020-0190, 1872-6119
    Vydáno: Amsterdam Elsevier B.V 31.08.2009
    Vydáno v Information processing letters (31.08.2009)
    “… (Single Instruction Stream Multiple Data Stream Concurrent Read Concurrent Write Parallel Random Access Machine…”
    Získat plný text
    Journal Article
  16. 16

    EDGE: Event-Driven GPU Execution Autor Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

    ISSN: 2641-7936
    Vydáno: IEEE 01.09.2019
    “… supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction…”
    Získat plný text
    Konferenční příspěvek
  17. 17

    Occamy: Memory-efficient GPU Compiler for DNN Inference Autor Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

    Vydáno: IEEE 09.07.2023
    “…This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy…”
    Získat plný text
    Konferenční příspěvek
  18. 18

    Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators Autor Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

    Vydáno: IEEE 29.06.2024
    “…Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations…”
    Získat plný text
    Konferenční příspěvek
  19. 19

    Offloaded MPI message matching: an optimistic approach Autor Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

    Vydáno: IEEE 17.11.2024
    “… In this work, we propose a bin-based MPI message approach, Optimistic Tag Matching, explicitly designed for the lightweight, highly parallel architectures typical of on-path SmartNICs…”
    Získat plný text
    Konferenční příspěvek
  20. 20

    Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC Autor Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

    ISSN: 1045-9219, 1558-2183
    Vydáno: New York IEEE 01.04.2022
    “… models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix…”
    Získat plný text
    Journal Article