Search Results - single instruction multiple data parallel computation model
-
1
A GPU-based numerical manifold method for modeling the formation of the excavation damaged zone in deep rock tunnels
ISSN: 0266-352X, 1873-7633Published: New York Elsevier Ltd 01.02.2020Published in Computers and geotechnics (01.02.2020)“…In this study, combined with the zero–thickness cohesive element (ZE) model and explicit integration method, a parallelization technique based on graphics processing units (GPU…”
Get full text
Journal Article -
2
Splitwise: Efficient Generative LLM Inference Using Phase Splitting
Published: IEEE 29.06.2024Published in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)“…Generative large language model (LLM) applications are growing rapidly, leading to large-scale deployments of expensive and power-hungry GPUs…”
Get full text
Conference Proceeding -
3
Max-PIM: Fast and Efficient Max/Min Searching in DRAM
Published: IEEE 05.12.2021Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)“…Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing…”
Get full text
Conference Proceeding -
4
A flexible algorithm for calculating pair interactions on SIMD architectures
ISSN: 0010-4655, 1879-2944, 1879-2944Published: Elsevier B.V 01.12.2013Published in Computer physics communications (01.12.2013)“… In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD…”
Get full text
Journal Article -
5
PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…Large language models (LLMs) have transformed numerous AI applications, with on-device deployment becoming increasingly important for reducing cloud computing costs and protecting user privacy…”
Get full text
Conference Proceeding -
6
McPAL: Scaling Unstructured Sparse Inference with Multi-Chiplet HBM-PIM Architecture for LLMs
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…Large language models (LLMs) have gained significant attention recently. However, executing LLM is memory-bound due to the extensive memory accesses. Process-in-memory (PIM…”
Get full text
Conference Proceeding -
7
KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important…”
Get full text
Conference Proceeding -
8
A Novel Wavefront-Based High Parallel Solution for HEVC Encoding
ISSN: 1051-8215, 1558-2205Published: New York IEEE 01.01.2016Published in IEEE transactions on circuits and systems for video technology (01.01.2016)“… On data level, optimal single-instruction-multiple-data algorithms are designed for the enhanced coding tools, i.e…”
Get full text
Journal Article -
9
MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing
ISSN: 2378-203XPublished: IEEE 02.03.2024Published in Proceedings - International Symposium on High-Performance Computer Architecture (02.03.2024)“…, 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways…”
Get full text
Conference Proceeding -
10
PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions
ISSN: 0278-0070, 1937-4151Published: New York IEEE 01.04.2023Published in IEEE transactions on computer-aided design of integrated circuits and systems (01.04.2023)“… In this article, we propose PHCG, an optimized code generator for the Simulink model with single-instruction-multiple-data (SIMD…”
Get full text
Journal Article -
11
PairGraph: An Efficient Search-space-aware Accelerator for High-performance Concurrent Pairwise Queries
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…Pairwise queries have been widely used in many applications. Although several approaches have been recently proposed to accelerate a single query, they still…”
Get full text
Conference Proceeding -
12
Pipirima: Predicting Patterns in Sparsity to Accelerate Matrix Algebra
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…While sparsity, a feature of data in many applications, provides optimization opportunities such as reducing unnecessary computations, data transfers, and storage, it causes several challenges, too…”
Get full text
Conference Proceeding -
13
PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“… During deployment on single-instruction-multiple-threads (SIMT) architectures, weights are stored in low-precision integer (INT…”
Get full text
Conference Proceeding -
14
PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration
Published: IEEE 05.12.2021Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)“…Graph Convolutional Network (GCN) is a promising but computing- and memory-intensive learning model…”
Get full text
Conference Proceeding -
15
A parallel algorithm for generating bicompatible elimination orderings of proper interval graphs
ISSN: 0020-0190, 1872-6119Published: Amsterdam Elsevier B.V 31.08.2009Published in Information processing letters (31.08.2009)“… (Single Instruction Stream Multiple Data Stream Concurrent Read Concurrent Write Parallel Random Access Machine…”
Get full text
Journal Article -
16
EDGE: Event-Driven GPU Execution
ISSN: 2641-7936Published: IEEE 01.09.2019Published in Proceedings / International Conference on Parallel Architectures and Compilation Techniques (01.09.2019)“… supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction…”
Get full text
Conference Proceeding -
17
Occamy: Memory-efficient GPU Compiler for DNN Inference
Published: IEEE 09.07.2023Published in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)“…This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy…”
Get full text
Conference Proceeding -
18
Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators
Published: IEEE 29.06.2024Published in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)“…Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations…”
Get full text
Conference Proceeding -
19
Offloaded MPI message matching: an optimistic approach
Published: IEEE 17.11.2024Published in SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (17.11.2024)“… In this work, we propose a bin-based MPI message approach, Optimistic Tag Matching, explicitly designed for the lightweight, highly parallel architectures typical of on-path SmartNICs…”
Get full text
Conference Proceeding -
20
Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC
ISSN: 1045-9219, 1558-2183Published: New York IEEE 01.04.2022Published in IEEE transactions on parallel and distributed systems (01.04.2022)“… models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix…”
Get full text
Journal Article