Search Results - shared‐memory algorithms
-
1
Max-PIM: Fast and Efficient Max/Min Searching in DRAM
Published: IEEE 05.12.2021Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)“… In this work, for the first time, we propose a novel 'Min/Max-in-memory' algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum…”
Get full text
Conference Proceeding -
2
PATSMA: Parameter Auto-tuning for Shared Memory Algorithms
ISSN: 2352-7110, 2352-7110Published: Elsevier B.V 01.09.2024Published in SoftwareX (01.09.2024)“…Programs with high levels of complexity often face challenges in adjusting execution parameters, particularly when the ideal value for these parameters may…”
Get full text
Journal Article -
3
SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all…”
Get full text
Conference Proceeding -
4
pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM Architectures
Published: IEEE 29.06.2024Published in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)“…Recent commercial incarnations of processing-in-memory (PIM) maintain the standard DRAM interface and employ the all-bank mode execution to maximize bank-level…”
Get full text
Conference Proceeding -
5
PIMDup: An Optimized Deduplication Design on a Real Processing-in-Memory System
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…Data deduplication enhances storage efficiency through non-destructive compression but is often hindered by the chunking process, which requires scanning the…”
Get full text
Conference Proceeding -
6
UPVSS: Jointly Managing Vector Similarity Search with Near-Memory Processing Systems
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“… to performance limitations in conventional von Neumann architecture with shared memory buses. This data movement bottleneck restricts the efficiency and scalability of vector similarity search due to insufficient memory bandwidth…”
Get full text
Conference Proceeding -
7
AttenPIM: Accelerating LLM Attention with Dual-mode GEMV in Processing-in-Memory
Published: IEEE 22.06.2025Published in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)“…Large Language Models (LLMs) have demonstrated unprecedented generative performance across a wide range of applications. While recent heterogeneous…”
Get full text
Conference Proceeding -
8
Parallelization of particle-mass-transfer algorithms on shared-memory, multi-core CPUs
ISSN: 0309-1708Published: Elsevier Ltd 01.11.2024Published in Advances in water resources (01.11.2024)“…Simulating the transfer of mass between particles is not straightforwardly parallelized because it involves the calculation of the influence of many particles…”
Get full text
Journal Article -
9
DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs
Published: IEEE 09.07.2023Published in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)“… Based on the resource graph of DARIC, a mapping algorithm supporting path sharing is proposed…”
Get full text
Conference Proceeding -
10
Separating Mechanism from Policy in STM
Published: IEEE 21.10.2023Published in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21.10.2023)“…When designing concurrent data structures (CDSs), it can feel like programmers must choose between performance and convenience. On one hand, Software…”
Get full text
Conference Proceeding -
11
SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link
Published: IEEE 21.10.2023Published in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21.10.2023)“…Disaggregated memory has been gaining significant traction as a promising solution for scaling memory capacity and better utilizing memory resources in data…”
Get full text
Conference Proceeding -
12
Modularity‐based parallel protein design algorithm with an implementation using shared memory programming
ISSN: 0887-3585, 1097-0134, 1097-0134Published: Hoboken, USA John Wiley & Sons, Inc 01.03.2022Published in Proteins, structure, function, and bioinformatics (01.03.2022)“…–based parallel protein design algorithm. The modular architecture of the protein structure is exploited by considering an intermediate structural organization between secondary structure and domain defined as protein unit (PU…”
Get full text
Journal Article -
13
Hardware Support for Durable Atomic Instructions for Persistent Parallel Programming
Published: IEEE 09.07.2023Published in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)“…Persistent memory is emerging as an attractive main memory fabric capable of hosting persistent data. However, its programmability is hampered by the lack of…”
Get full text
Conference Proceeding -
14
Mutable locks: Combining the best of spin and sleep locks
ISSN: 1532-0626, 1532-0634Published: Hoboken Wiley Subscription Services, Inc 25.11.2020Published in Concurrency and computation (25.11.2020)“…Summary In this article, we present mutable locks, a synchronization construct with the same semantic of traditional locks (such as spin locks or sleep locks),…”
Get full text
Journal Article -
15
BLOwing Trees to the Ground: Layout Optimization of Decision Trees on Racetrack Memory
Published: IEEE 05.12.2021Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)“…Modern distributed low power systems tend to integrate machine learning algorithms, which are directly executed on the distributed devices (on the edge…”
Get full text
Conference Proceeding -
16
Scalable task parallelism for NUMA: A uniform abstraction for coordinated scheduling and memory management
Published: ACM 01.09.2016Published in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (01.09.2016)“…Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality…”
Get full text
Conference Proceeding -
17
Ultrafast CPU/GPU Kernels for Density Accumulation in Placement
Published: IEEE 05.12.2021Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)“…Density accumulation is a widely-used primitive operation in physical design, especially for placement. Iterative invocation in the optimization flow makes it…”
Get full text
Conference Proceeding -
18
An enhanced parallel block coordinate descent algorithm with shared memory for solving large-scale user equilibrium problems
ISSN: 1366-5545Published: Elsevier Ltd 01.12.2025Published in Transportation research. Part E, Logistics and transportation review (01.12.2025)“…•OpenMP-based parallelization of the PBCD algorithm to leverage shared memory parallelism efficiently…”
Get full text
Journal Article -
19
CAF: Core to core Communication Acceleration Framework
Published: ACM 01.09.2016Published in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (01.09.2016)“… The traditional way cores communicate is by using shared memory space between them. However, shared memory communication fundamentally involves coherence invalidations…”
Get full text
Conference Proceeding -
20
Optimizing Persistent Memory Transactions
ISSN: 2641-7936Published: IEEE 01.09.2019Published in Proceedings / International Conference on Parallel Architectures and Compilation Techniques (01.09.2019)“…Byte-addressable, non-volatile, random access memory (NVM) has the potential to dramatically accelerate the performance of storage-intensive workloads. For…”
Get full text
Conference Proceeding