Search Results - shared memory parallel algorithm
-
1
Max-PIM: Fast and Efficient Max/Min Searching in DRAM
Published: IEEE 05.12.2021Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)“… In this work, for the first time, we propose a novel 'Min/Max-in-memory' algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum…”
Get full text
Conference Proceeding -
2
Modularity‐based parallel protein design algorithm with an implementation using shared memory programming
ISSN: 0887-3585, 1097-0134, 1097-0134Published: Hoboken, USA John Wiley & Sons, Inc 01.03.2022Published in Proteins, structure, function, and bioinformatics (01.03.2022)“…–based parallel protein design algorithm. The modular architecture of the protein structure is exploited by considering an intermediate structural organization between secondary structure and domain defined as protein unit (PU…”
Get full text
Journal Article -
3
Joint direct and transposed sparse matrix‐vector multiplication for multithreaded CPUs
ISSN: 1532-0626, 1532-0634Published: Hoboken Wiley Subscription Services, Inc 10.07.2021Published in Concurrency and computation (10.07.2021)“…‐vector multiplication (SpMMTV). In this article, we present a parallel SpMMTV algorithm for shared‐memory CPUs…”
Get full text
Journal Article -
4
A Shared-Memory Parallel Alpha-Tree Algorithm for Extreme Dynamic Ranges
ISSN: 1057-7149, 1941-0042, 1941-0042Published: United States IEEE 01.01.2025Published in IEEE transactions on image processing (01.01.2025)“…The <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula>-tree is an effective hierarchical image representation used for connected…”
Get full text
Journal Article -
5
pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM Architectures
Published: IEEE 29.06.2024Published in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)“…Recent commercial incarnations of processing-in-memory (PIM) maintain the standard DRAM interface and employ the all-bank mode execution to maximize bank-level memory bandwidth…”
Get full text
Conference Proceeding -
6
SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all banks operate simultaneously under a single command…”
Get full text
Conference Proceeding -
7
PIMDup: An Optimized Deduplication Design on a Real Processing-in-Memory System
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“… and storage are separated in a processor-centric design, necessitating multiple memory hierarchy traversals and causing inefficiencies…”
Get full text
Conference Proceeding -
8
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images
ISSN: 0162-8828, 1939-3539, 2160-9292, 1939-3539Published: United States IEEE 01.03.2018Published in IEEE transactions on pattern analysis and machine intelligence (01.03.2018)“… However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel…”
Get full text
Journal Article -
9
UPVSS: Jointly Managing Vector Similarity Search with Near-Memory Processing Systems
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“… to performance limitations in conventional von Neumann architecture with shared memory buses. This data movement bottleneck restricts the efficiency and scalability of vector similarity search due to insufficient memory bandwidth…”
Get full text
Conference Proceeding -
10
AttenPIM: Accelerating LLM Attention with Dual-mode GEMV in Processing-in-Memory
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“… While recent heterogeneous architectures attempt to address the memory-bound bottleneck from attention computations by processing-in-memory (PIM…”
Get full text
Conference Proceeding -
11
Hardware Support for Durable Atomic Instructions for Persistent Parallel Programming
Published: IEEE 09.07.2023Published in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)“…Persistent memory is emerging as an attractive main memory fabric capable of hosting persistent data…”
Get full text
Conference Proceeding -
12
SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link
Published: IEEE 21.10.2023Published in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21.10.2023)“…Disaggregated memory has been gaining significant traction as a promising solution for scaling memory capacity and better utilizing memory resources in data centers…”
Get full text
Conference Proceeding -
13
DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs
Published: IEEE 09.07.2023Published in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)“… For parallel data accesses, uniform memory partitioning is usually introduced to CGRA for better pipelining performance…”
Get full text
Conference Proceeding -
14
Scalable task parallelism for NUMA: A uniform abstraction for coordinated scheduling and memory management
Published: ACM 01.09.2016Published in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (01.09.2016)“…Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality…”
Get full text
Conference Proceeding -
15
A Shared-Memory Parallel Algorithm for Updating Single-Source Shortest Paths in Large Dynamic Networks
ISSN: 2640-0316Published: IEEE 01.12.2018Published in 2018 IEEE 25th International Conference on High Performance Computing (HiPC) (01.12.2018)“… To this end, we present a novel two-step shared-memory algorithm for updating SSSP on weighted large-scale graphs…”
Get full text
Conference Proceeding -
16
A hybrid shared/distributed memory parallel genetic algorithm for optimization of laminate composites
ISSN: 0263-8223, 1879-1085Published: Elsevier Ltd 01.01.2014Published in Composite structures (01.01.2014)“…This work presents a genetic algorithm combining two types of computational parallelization methods, resulting in a hybrid shared/distributed memory algorithm based on the island model using…”
Get full text
Journal Article -
17
BLOwing Trees to the Ground: Layout Optimization of Decision Trees on Racetrack Memory
Published: IEEE 05.12.2021Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)“…Modern distributed low power systems tend to integrate machine learning algorithms, which are directly executed on the distributed devices (on the edge…”
Get full text
Conference Proceeding -
18
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in Large-Scale Dynamic Networks
ISSN: 1045-9219, 1558-2183Published: New York IEEE 01.04.2022Published in IEEE transactions on parallel and distributed systems (01.04.2022)“… We present a novel parallel algorithmic framework for updating the SSSP in large-scale dynamic networks and implement it on the shared-memory and GPU platforms…”
Get full text
Journal Article -
19
Separating Mechanism from Policy in STM
Published: IEEE 21.10.2023Published in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21.10.2023)“… On one hand, Software Transactional Memory (STM) is easy, because it allows programmers to simply mark regions of sequential code as requiring atomicity, and then the compiler ensures that no races manifest…”
Get full text
Conference Proceeding -
20
Shared-Memory Parallel Edmonds Blossom Algorithm for Maximum Cardinality Matching in General Graphs
ISSN: 2164-7062Published: United States IEEE 01.05.2024Published in 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (01.05.2024)“…The Edmonds Blossom algorithm is implemented here using depth-first search, which is intrinsically serial…”
Get full text
Conference Proceeding Journal Article