SDM: Sharing-Enabled Disaggregated Memory System with Cache Coherent Compute Express Link

Disaggregated memory has been gaining significant traction as a promising solution for scaling memory capacity and better utilizing memory resources in data centers. However, a disaggregated memory system that can simultaneously achieve high performance and user transparency is still not available....

Full description

Saved in:
Bibliographic Details
Published in:2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) pp. 86 - 98
Main Authors: Lee, Hyokeun, Choi, Kwanseok, Lee, Hvuk-Jae, Sim, Jaewoong
Format: Conference Proceeding
Language:English
Published: IEEE 21.10.2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Disaggregated memory has been gaining significant traction as a promising solution for scaling memory capacity and better utilizing memory resources in data centers. However, a disaggregated memory system that can simultaneously achieve high performance and user transparency is still not available. Although some modern interconnect technologies now feature hardware coherence protocols that can potentially enable data sharing among multiple computing nodes in a user-transparent manner, naively applying these technologies to disaggregated memory systems results in non-negligible performance overheads. In this work, we propose SDM, a sharing-enabled, cache-coherent disaggregated memory system that effectively utilizes modern interconnect technology. The key design principle of SDM is to implement a novel, dedicated control flow that efficiently enables data sharing among multiple computing nodes without the need to modify user applications, by leveraging the message types defined in the modern memory expansion standard, Compute Express Link (CXL). We also introduce resource management and speculative memory access mechanisms that do not interfere with normal memory transaction channels, thereby further improving the performance of disaggregated memory systems. We evaluate our design based on an in-house simulation framework with detailed analytical models that mimic a cache-coherent multi-node disaggregated memory system. The results show that SDM outperforms the optimized baseline system, which is similar to the one employing CXL 3.0, by 5.77\times and 2.65\times for two distinctive benchmark suites.
DOI:10.1109/PACT58117.2023.00016