Suchergebnisse - single instruction multiple data parallel computation model

1

Wird geladen …

A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels von Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

ISSN: 0266-352X, 1873-7633

Veröffentlicht: New York Elsevier Ltd 01.02.2020

Veröffentlicht in Computers and geotechnics (01.02.2020)
“… In this study, combined with the zero–thickness cohesive element (ZE) model and explicit integration method, a parallelization technique based on graphics processing units (GPU …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
2

Wird geladen …

Splitwise: Efficient Generative LLM Inference Using Phase Splitting von Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

Veröffentlicht: IEEE 29.06.2024

Veröffentlicht in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“… Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
3

Wird geladen …

Max-PIM: Fast and Efficient Max/Min Searching in DRAM von Zhang, Fan, Angizi, Shaahin, Fan, Deliang

Veröffentlicht: IEEE 05.12.2021

Veröffentlicht in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“… Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
4

Wird geladen …

A flexible algorithm for calculating pair interactions on SIMD architectures von Páll, Szilárd, Hess, Berk

ISSN: 0010-4655, 1879-2944, 1879-2944

Veröffentlicht: Elsevier B.V 01.12.2013

Veröffentlicht in Computer physics communications (01.12.2013)
“… In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
5

Wird geladen …

PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type von Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… Large language models (LLMs) have transformed numerous AI applications, with on-device deployment becoming increasingly important for reducing cloud computing costs and protecting user privacy …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
6

Wird geladen …

McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs von Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… Large language models (LLMs) have gained significant attention recently. However, executing LLM is memory-bound due to the extensive memory accesses. Process-in-memory (PIM …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
7

Wird geladen …

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference von Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
8

Wird geladen …

A Novel Wavefront-Based High Parallel Solution for HEVC Encoding von Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

ISSN: 1051-8215, 1558-2205

Veröffentlicht: New York IEEE 01.01.2016

Veröffentlicht in IEEE transactions on circuits and systems for video technology (01.01.2016)
“… On data level, optimal single-instruction-multiple-data algorithms are designed for the enhanced coding tools, i.e …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
9

Wird geladen …

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing von Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

ISSN: 2378-203X

Veröffentlicht: IEEE 02.03.2024

Veröffentlicht in Proceedings - International Symposium on High-Performance Computer Architecture (02.03.2024)
“… , 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
10

Wird geladen …

PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions von Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

ISSN: 0278-0070, 1937-4151

Veröffentlicht: New York IEEE 01.04.2023

Veröffentlicht in IEEE transactions on computer-aided design of integrated circuits and systems (01.04.2023)
“… In this article, we propose PHCG, an optimized code generator for the Simulink model with single-instruction-multiple-data (SIMD …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
11

Wird geladen …

PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries von Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… Pairwise queries have been widely used in many applications. Although several approaches have been recently proposed to accelerate a single query, they still …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
12

Wird geladen …

Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra von Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… While sparsity, a feature of data in many applications, provides optimization opportunities such as reducing unnecessary computations, data transfers, and storage, it causes several challenges, too …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
13

Wird geladen …

PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs von Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
14

Wird geladen …

PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration von Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

Veröffentlicht: IEEE 05.12.2021

Veröffentlicht in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“… Graph Convolutional Network (GCN) is a promising but computing- and memory-intensive learning model …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
15

Wird geladen …

A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs von Panda, B.S., Das, Sajal K.

ISSN: 0020-0190, 1872-6119

Veröffentlicht: Amsterdam Elsevier B.V 31.08.2009

Veröffentlicht in Information processing letters (31.08.2009)
“… (Single Instruction Stream Multiple Data Stream Concurrent Read Concurrent Write Parallel Random Access Machine …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:
16

Wird geladen …

EDGE: Event-Driven GPU Execution von Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

ISSN: 2641-7936

Veröffentlicht: IEEE 01.09.2019

Veröffentlicht in Proceedings / International Conference on Parallel Architectures and Compilation Techniques (01.09.2019)
“… supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
17

Wird geladen …

Occamy: Memory-efficient GPU Compiler for DNN Inference von Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

Veröffentlicht: IEEE 09.07.2023

Veröffentlicht in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)
“… This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
18

Wird geladen …

Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators von Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

Veröffentlicht: IEEE 29.06.2024

Veröffentlicht in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“… Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
19

Wird geladen …

Offloaded MPI message matching: an optimistic approach von Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

Veröffentlicht: IEEE 17.11.2024

Veröffentlicht in SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (17.11.2024)
“… In this work, we propose a bin-based MPI message approach, Optimistic Tag Matching, explicitly designed for the lightweight, highly parallel architectures typical of on-path SmartNICs …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
20

Wird geladen …

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC von Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

ISSN: 1045-9219, 1558-2183

Veröffentlicht: New York IEEE 01.04.2022

Veröffentlicht in IEEE transactions on parallel and distributed systems (01.04.2022)
“… models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix …”

Volltext

Journal Article

Zu den Favoriten

Gespeichert in:

Suchergebnisse - single instruction multiple data parallel computation model

A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels von Liu, Quanshen, Xu, Xiangyu, Wu, Zhijun

Splitwise: Efficient Generative LLM Inference Using Phase Splitting von Patel, Pratyush, Choukse, Esha, Zhang, Chaojie, Shah, Aashaka, Goiri, Inigo, Maleki, Saeed, Bianchini, Ricardo

Max-PIM: Fast and Efficient Max/Min Searching in DRAM von Zhang, Fan, Angizi, Shaahin, Fan, Deliang

A flexible algorithm for calculating pair interactions on SIMD architectures von Páll, Szilárd, Hess, Berk

PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type von Yang, Ning, Wang, Zongwu, Sun, Qingxiao, Lu, Liqiang, Liu, Fangxin

McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs von Liu, Shiwei, Huang, Zhirui, Yu, Jiangnan, Liu, Qi, Chen, Chixiao

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference von Li, Zhenyu, Lyu, Dongxu, Wang, Gang, Chen, Yuzhou, Chen, Liyan, Li, Wenjie, Jiang, Jianfei, Sun, Yanan, He, Guanghui

A Novel Wavefront-Based High Parallel Solution for HEVC Encoding von Chen, Keji, Sun, Jun, Duan, Yizhou, Guo, Zongming

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing von Oliveira, Geraldo F., Olgun, Ataberk, Yaglikci, Abdullah Giray, Bostanci, F. Nisa, Gomez-Luna, Juan, Ghose, Saugata, Mutlu, Onur

PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions von Su, Zhuo, Wang, Dongyan, Yu, Zehong, Yang, Yixiao, Jiang, Yu, Wang, Rui, Chang, Wanli, Li, Wen, Cui, Aiguo, Sun, Jiaguang

PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries von Fu, Yutao, Long, Zhongtian, Zhang, Yu, He, Zirui, Zhao, Jin, Niu, Qiyuan, Wang, Zixiao, Jin, Hai

Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra von Bakhtiar, Ubaid, Joo, Donghyeon, Asgari, Bahar

PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs von Yin, Ruokai, Li, Yuhang, Panda, Priyadarshini

PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration von Yang, Tao, Li, Dongyue, Han, Yibo, Zhao, Yilong, Liu, Fangxin, Liang, Xiaoyao, He, Zhezhi, Jiang, Li

A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs von Panda, B.S., Das, Sajal K.

EDGE: Event-Driven GPU Execution von Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.

Occamy: Memory-efficient GPU Compiler for DNN Inference von Lee, Jaeho, Jeong, Shinnung, Song, Seungbin, Kim, Kunwoo, Choi, Heelim, Kim, Youngsok, Kim, Hanjun

Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators von Wang, Fuyu, Shen, Minghua, Ding, Yufei, Xiao, Nong

Offloaded MPI message matching: an optimistic approach von Garcia, Jeronimo S., Di Girolamo, Salvatore, Kosta, Sokol, Olmos, J.J. Vegas, Nudelman, Rami, Hoefler, Torsten, Bloch, Gil

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC von Abdulah, Sameh, Cao, Qinglei, Pei, Yu, Bosilca, George, Dongarra, Jack, Genton, Marc G., Keyes, David E., Ltaief, Hatem, Sun, Ying

Suchwerkzeuge:

Treffer weiter einschränken

Format

Schlagwortumfeld

Thema

Sprache

Erscheinungsjahr