Search Results - single instruction multiple data parallel computation model

1

Loading…

A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels by Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

ISSN: 0266-352X, 1873-7633

Published: New York Elsevier Ltd 01.02.2020

Published in Computers and geotechnics (01.02.2020)
“…In this study, combined with the zero–thickness cohesive element (ZE) model and explicit integration method, a parallelization technique based on graphics processing units (GPU…”

Get full text

Journal Article

Save to List

Saved in:
2

Loading…

Splitwise: Efficient Generative LLM Inference Using Phase Splitting by Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

Published: IEEE 29.06.2024

Published in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“…Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs…”

Get full text

Conference Proceeding

Save to List

Saved in:
3

Loading…

Max-PIM: Fast and Efficient Max/Min Searching in DRAM by Zhang, Fan, Angizi, Shaahin, Fan, Deliang

Published: IEEE 05.12.2021

Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“…Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing…”

Get full text

Conference Proceeding

Save to List

Saved in:
4

Loading…

A flexible algorithm for calculating pair interactions on SIMD architectures by Páll, Szilárd, Hess, Berk

ISSN: 0010-4655, 1879-2944, 1879-2944

Published: Elsevier B.V 01.12.2013

Published in Computer physics communications (01.12.2013)
“… In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD…”

Get full text

Journal Article

Save to List

Saved in:
5

Loading…

PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type by Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Large language models (LLMs) have transformed numerous AI applications, with on-device deployment becoming increasingly important for reducing cloud computing costs and protecting user privacy…”

Get full text

Conference Proceeding

Save to List

Saved in:
6

Loading…

McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs by Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Large language models (LLMs) have gained significant attention recently. However, executing LLM is memory-bound due to the extensive memory accesses. Process-in-memory (PIM…”

Get full text

Conference Proceeding

Save to List

Saved in:
7

Loading…

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference by Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important…”

Get full text

Conference Proceeding

Save to List

Saved in:
8

Loading…

A Novel Wavefront-Based High Parallel Solution for HEVC Encoding by Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

ISSN: 1051-8215, 1558-2205

Published: New York IEEE 01.01.2016

Published in IEEE transactions on circuits and systems for video technology (01.01.2016)
“… On data level, optimal single-instruction-multiple-data algorithms are designed for the enhanced coding tools, i.e…”

Get full text

Journal Article

Save to List

Saved in:
9

Loading…

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing by Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

ISSN: 2378-203X

Published: IEEE 02.03.2024

Published in Proceedings - International Symposium on High-Performance Computer Architecture (02.03.2024)
“…, 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways…”

Get full text

Conference Proceeding

Save to List

Saved in:
10

Loading…

PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions by Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

ISSN: 0278-0070, 1937-4151

Published: New York IEEE 01.04.2023

Published in IEEE transactions on computer-aided design of integrated circuits and systems (01.04.2023)
“… In this article, we propose PHCG, an optimized code generator for the Simulink model with single-instruction-multiple-data (SIMD…”

Get full text

Journal Article

Save to List

Saved in:
11

Loading…

PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries by Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…Pairwise queries have been widely used in many applications. Although several approaches have been recently proposed to accelerate a single query, they still…”

Get full text

Conference Proceeding

Save to List

Saved in:
12

Loading…

Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra by Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“…While sparsity, a feature of data in many applications, provides optimization opportunities such as reducing unnecessary computations, data transfers, and storage, it causes several challenges, too…”

Get full text

Conference Proceeding

Save to List

Saved in:
13

Loading…

PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs by Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

Published: IEEE 22.06.2025

Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT…”

Get full text

Conference Proceeding

Save to List

Saved in:
14

Loading…

PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration by Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

Published: IEEE 05.12.2021

Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“…Graph Convolutional Network (GCN) is a promising but computing- and memory-intensive learning model…”

Get full text

Conference Proceeding

Save to List

Saved in:
15

Loading…

A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs by Panda, B.S., Das, Sajal K.

ISSN: 0020-0190, 1872-6119

Published: Amsterdam Elsevier B.V 31.08.2009

Published in Information processing letters (31.08.2009)
“… (Single Instruction Stream Multiple Data Stream Concurrent Read Concurrent Write Parallel Random Access Machine…”

Get full text

Journal Article

Save to List

Saved in:
16

Loading…

EDGE: Event-Driven GPU Execution by Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

ISSN: 2641-7936

Published: IEEE 01.09.2019

Published in Proceedings / International Conference on Parallel Architectures and Compilation Techniques (01.09.2019)
“… supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction…”

Get full text

Conference Proceeding

Save to List

Saved in:
17

Loading…

Occamy: Memory-efficient GPU Compiler for DNN Inference by Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

Published: IEEE 09.07.2023

Published in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)
“…This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy…”

Get full text

Conference Proceeding

Save to List

Saved in:
18

Loading…

Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators by Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

Published: IEEE 29.06.2024

Published in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“…Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations…”

Get full text

Conference Proceeding

Save to List

Saved in:
19

Loading…

Offloaded MPI message matching: an optimistic approach by Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

Published: IEEE 17.11.2024

Published in SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (17.11.2024)
“… In this work, we propose a bin-based MPI message approach, Optimistic Tag Matching, explicitly designed for the lightweight, highly parallel architectures typical of on-path SmartNICs…”

Get full text

Conference Proceeding

Save to List

Saved in:
20

Loading…

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC by Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

ISSN: 1045-9219, 1558-2183

Published: New York IEEE 01.04.2022

Published in IEEE transactions on parallel and distributed systems (01.04.2022)
“… models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix…”

Get full text

Journal Article

Save to List

Saved in:

Search Results - single instruction multiple data parallel computation model

A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels by Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

Splitwise: Efficient Generative LLM Inference Using Phase Splitting by Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

Max-PIM: Fast and Efficient Max/Min Searching in DRAM by Zhang, Fan, Angizi, Shaahin, Fan, Deliang

A flexible algorithm for calculating pair interactions on SIMD architectures by Páll, Szilárd, Hess, Berk

PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type by Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs by Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference by Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

A Novel Wavefront-Based High Parallel Solution for HEVC Encoding by Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing by Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions by Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries by Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra by Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs by Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration by Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs by Panda, B.S., Das, Sajal K.

EDGE: Event-Driven GPU Execution by Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

Occamy: Memory-efficient GPU Compiler for DNN Inference by Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators by Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

Offloaded MPI message matching: an optimistic approach by Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC by Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

Search Tools:

Refine Results

Format

Subject Area

Topic

Language

Year of Publication