Suchergebnisse - Theory of computation → Shared memory algorithms

1

Wird geladen …

Verifying Lock-Free Search Structure Templates (Artifact) von Patel, Nisarg, Shasha, Dennis, Wies, Thomas

ISSN: 2509-8195

Veröffentlicht: Schloss Dagstuhl – Leibniz-Zentrum für Informatik 12.09.2024

“… We present and verify template algorithms for lock-free concurrent search structures that cover a broad range of existing implementations based on lists and skiplists …”

Volltext

Datensatz

Zu den Favoriten

Gespeichert in:
2

Wird geladen …

pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM Architectures von Baek, Daehyeon, Hwang, Soojin, Huh, Jaehyuk

Veröffentlicht: IEEE 29.06.2024

Veröffentlicht in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) (29.06.2024)
“… Recent commercial incarnations of processing-in-memory (PIM) maintain the standard DRAM interface and employ the all-bank mode execution to maximize bank-level memory bandwidth …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
3

Wird geladen …

Max-PIM: Fast and Efficient Max/Min Searching in DRAM von Zhang, Fan, Angizi, Shaahin, Fan, Deliang

Veröffentlicht: IEEE 05.12.2021

Veröffentlicht in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“… In this work, for the first time, we propose a novel 'Min/Max-in-memory' algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
4

Wird geladen …

SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM von Yoon, Byungkuk, Han, Sanghyeok, Park, Gyeonghwan, Kim, Jae-Joon

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all banks operate simultaneously under a single command …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
5

Wird geladen …

PIMDup: An Optimized Deduplication Design on a Real Processing-in-Memory System von Yeh, Chun-Le, Chen, Liang-Chi, Ho, Chien-Chung, Chang, Yu-Ming, Chang, Da-Wei

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… To overcome these challenges, we explore UPMEM's DPU, a processing-in-memory (PIM) technology that reduces data movement by performing computations directly within memory …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
6

Wird geladen …

UPVSS: Jointly Managing Vector Similarity Search with Near-Memory Processing Systems von Liu, Chun-Chien, Wu, Chun-Feng, Jin, Yunho

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… to performance limitations in conventional von Neumann architecture with shared memory buses. This data movement bottleneck restricts the efficiency and scalability of vector similarity search due to insufficient memory bandwidth …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
7

Wird geladen …

AttenPIM: Accelerating LLM Attention with Dual-mode GEMV in Processing-in-Memory von Chen, Liyan, Lyu, Dongxu, Li, Zhenyu, Jiang, Jianfei, Wang, Qin, Mao, Zhigang, Jing, Naifeng

Veröffentlicht: IEEE 22.06.2025

Veröffentlicht in 2025 62nd ACM/IEEE Design Automation Conference (DAC) (22.06.2025)
“… While recent heterogeneous architectures attempt to address the memory-bound bottleneck from attention computations by processing-in-memory (PIM …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
8

Wird geladen …

BLOwing Trees to the Ground: Layout Optimization of Decision Trees on Racetrack Memory von Hakert, Christian, Khan, Asif Ali, Chen, Kuan-Hsun, Hameed, Fazal, Castrillon, Jeronimo, Chen, Jian-Jia

Veröffentlicht: IEEE 05.12.2021

Veröffentlicht in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“… Modern distributed low power systems tend to integrate machine learning algorithms, which are directly executed on the distributed devices (on the edge …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
9

Wird geladen …

SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link von Lee, Hyokeun, Choi, Kwanseok, Lee, Hvuk-Jae, Sim, Jaewoong

Veröffentlicht: IEEE 21.10.2023

Veröffentlicht in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21.10.2023)
“… Disaggregated memory has been gaining significant traction as a promising solution for scaling memory capacity and better utilizing memory resources in data centers …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
10

Wird geladen …

Hardware Support for Durable Atomic Instructions for Persistent Parallel Programming von Hadi, Khan Shaikhul, Ul Mustafa, Naveed, Heinrich, Mark, Solihin, Yan

Veröffentlicht: IEEE 09.07.2023

Veröffentlicht in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)
“… Persistent memory is emerging as an attractive main memory fabric capable of hosting persistent data …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
11

Wird geladen …

Optimizing Persistent Memory Transactions von Zardoshti, Pantea, Zhou, Tingzhe, Liu, Yujie, Spear, Michael

ISSN: 2641-7936

Veröffentlicht: IEEE 01.09.2019

Veröffentlicht in Proceedings / International Conference on Parallel Architectures and Compilation Techniques (01.09.2019)
“… Byte-addressable, non-volatile, random access memory (NVM) has the potential to dramatically accelerate the performance of storage-intensive workloads …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
12

Wird geladen …

Separating Mechanism from Policy in STM von Sheng, Yaodong, Hassan, Ahmed, Spear, Michael

Veröffentlicht: IEEE 21.10.2023

Veröffentlicht in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21.10.2023)
“… On one hand, Software Transactional Memory (STM) is easy, because it allows programmers to simply mark regions of sequential code as requiring atomicity, and then the compiler ensures that no races manifest …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
13

Wird geladen …

DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs von Liu, Dajiang, Mou, Di, Zhu, Rong, Zhuang, Yan, Shang, Jiaxing, Zhong, Jiang, Yin, Shouyi

Veröffentlicht: IEEE 09.07.2023

Veröffentlicht in 2023 60th ACM/IEEE Design Automation Conference (DAC) (09.07.2023)
“… For parallel data accesses, uniform memory partitioning is usually introduced to CGRA for better pipelining performance …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
14

Wird geladen …

Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization von Singhal, Shubhendra Pal, Hati, Souvadra, Young, Jeffrey, Sarkar, Vivek, Hayashi, Akihiro, Vuduc, Richard

Veröffentlicht: IEEE 17.11.2024

Veröffentlicht in SC24: International Conference for High Performance Computing, Networking, Storage and Analysis (17.11.2024)
“… We propose distributed-memory parallel algorithms for the two main kernels of a state-of-the-art implementation of one IM algorithm, influence maximization via martingales (IMM …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
15

Wird geladen …

Ultrafast CPU/GPU Kernels for Density Accumulation in Placement von Guo, Zizheng, Mai, Jing, Lin, Yibo

Veröffentlicht: IEEE 05.12.2021

Veröffentlicht in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05.12.2021)
“… Density accumulation is a widely-used primitive operation in physical design, especially for placement. Iterative invocation in the optimization flow makes it …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
16

Wird geladen …

Scalable task parallelism for NUMA: A uniform abstraction for coordinated scheduling and memory management von Drebes, Andi, Pop, Antoniu, Heydemann, Karine, Cohen, Albert, Drach, Nathalie

Veröffentlicht: ACM 01.09.2016

Veröffentlicht in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (01.09.2016)
“… Dynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
17

Wird geladen …

Unfair Scheduling Patterns in NUMA Architectures von Ben-David, Naama, Scully, Ziv, Blelloch, Guy E.

ISSN: 2641-7936

Veröffentlicht: IEEE 01.09.2019

Veröffentlicht in Proceedings / International Conference on Parallel Architectures and Compilation Techniques (01.09.2019)
“… Lock-free algorithms are typically designed and analyzed with adversarial scheduling in mind …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
18

Wird geladen …

CAF: Core to core Communication Acceleration Framework von Yipeng Wang, Ren Wang, Herdrich, Andrew, Tsai, James, Solihin, Yan

Veröffentlicht: ACM 01.09.2016

Veröffentlicht in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (01.09.2016)
“… The traditional way cores communicate is by using shared memory space between them. However, shared memory communication fundamentally involves coherence invalidations …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
19

Wird geladen …

Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUs von Pan, Zhe, He, Shuibing, Li, Xu, Zhang, Xuechen, Yin, Yanlong, Wang, Rui, Shou, Lidan, Song, Mingli, Sun, Xian-He, Chen, Gang

Veröffentlicht: IEEE 17.11.2024

Veröffentlicht in SC24: International Conference for High Performance Computing, Networking, Storage and Analysis (17.11.2024)
“… To overcome this limitation, we propose an AdaMBE algorithm. First, we redesign its core operations using local neighborhood information derived from computational subgraphs to minimize redundant memory accesses …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:
20

Wird geladen …

Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods von Zhang, Jingrong, Naruse, Akira, Li, Xipeng, Wang, Yong

ISSN: 2167-4337

Veröffentlicht: ACM 11.11.2023

Veröffentlicht in International Conference for High Performance Computing, Networking, Storage and Analysis (Online) (11.11.2023)
“… As data volume grows rapidly, high-performance parallel top-K algorithms become critical …”

Volltext

Tagungsbericht

Zu den Favoriten

Gespeichert in:

Suchergebnisse - Theory of computation → Shared memory algorithms

Verifying Lock-Free Search Structure Templates (Artifact) von Patel, Nisarg, Shasha, Dennis, Wies, Thomas

pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM Architectures von Baek, Daehyeon, Hwang, Soojin, Huh, Jaehyuk

Max-PIM: Fast and Efficient Max/Min Searching in DRAM von Zhang, Fan, Angizi, Shaahin, Fan, Deliang

SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM von Yoon, Byungkuk, Han, Sanghyeok, Park, Gyeonghwan, Kim, Jae-Joon

PIMDup: An Optimized Deduplication Design on a Real Processing-in-Memory System von Yeh, Chun-Le, Chen, Liang-Chi, Ho, Chien-Chung, Chang, Yu-Ming, Chang, Da-Wei

UPVSS: Jointly Managing Vector Similarity Search with Near-Memory Processing Systems von Liu, Chun-Chien, Wu, Chun-Feng, Jin, Yunho

AttenPIM: Accelerating LLM Attention with Dual-mode GEMV in Processing-in-Memory von Chen, Liyan, Lyu, Dongxu, Li, Zhenyu, Jiang, Jianfei, Wang, Qin, Mao, Zhigang, Jing, Naifeng

BLOwing Trees to the Ground: Layout Optimization of Decision Trees on Racetrack Memory von Hakert, Christian, Khan, Asif Ali, Chen, Kuan-Hsun, Hameed, Fazal, Castrillon, Jeronimo, Chen, Jian-Jia

SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link von Lee, Hyokeun, Choi, Kwanseok, Lee, Hvuk-Jae, Sim, Jaewoong

Hardware Support for Durable Atomic Instructions for Persistent Parallel Programming von Hadi, Khan Shaikhul, Ul Mustafa, Naveed, Heinrich, Mark, Solihin, Yan

Optimizing Persistent Memory Transactions von Zardoshti, Pantea, Zhou, Tingzhe, Liu, Yujie, Spear, Michael

Separating Mechanism from Policy in STM von Sheng, Yaodong, Hassan, Ahmed, Spear, Michael

DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs von Liu, Dajiang, Mou, Di, Zhu, Rong, Zhuang, Yan, Shang, Jiaxing, Zhong, Jiang, Yin, Shouyi

Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization von Singhal, Shubhendra Pal, Hati, Souvadra, Young, Jeffrey, Sarkar, Vivek, Hayashi, Akihiro, Vuduc, Richard

Ultrafast CPU/GPU Kernels for Density Accumulation in Placement von Guo, Zizheng, Mai, Jing, Lin, Yibo

Scalable task parallelism for NUMA: A uniform abstraction for coordinated scheduling and memory management von Drebes, Andi, Pop, Antoniu, Heydemann, Karine, Cohen, Albert, Drach, Nathalie

Unfair Scheduling Patterns in NUMA Architectures von Ben-David, Naama, Scully, Ziv, Blelloch, Guy E.

CAF: Core to core Communication Acceleration Framework von Yipeng Wang, Ren Wang, Herdrich, Andrew, Tsai, James, Solihin, Yan

Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUs von Pan, Zhe, He, Shuibing, Li, Xu, Zhang, Xuechen, Yin, Yanlong, Wang, Rui, Shou, Lidan, Song, Mingli, Sun, Xian-He, Chen, Gang

Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods von Zhang, Jingrong, Naruse, Akira, Li, Xipeng, Wang, Yong

Suchwerkzeuge:

Treffer weiter einschränken

Format

Schlagwortumfeld

Thema

Sprache

Erscheinungsjahr