RAMCI: a novel asynchronous memory copying mechanism based on I/OAT

Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issu...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:CCF transactions on high performance computing (Online) Ročník 3; číslo 2; s. 129 - 143
Hlavní autoři: Chen, Zhenke, Li, Dingding, Wang, Zhiwen, Liu, Hai, Tang, Yong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Singapore Springer Singapore 01.06.2021
Springer Nature B.V
Témata:
ISSN:2524-4922, 2524-4930
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issue, some works based on I/OAT, which is a dedicated and popular hardware copying engine on Intel platform, is proposed but still exists several problems: (1) lacking atomic allocation/revocation at the granularity of I/OAT channel; (2) deficiency of interrupt support and (3) complicated programming interfaces. We propose RAMCI, an asynchronous (async) memory copying mechanism based on Intel I/OAT engine, not only improves the sync overheads, but also overcomes the above three issues through (1) a lock mechanism by using low-level CAS instruction; (2) a lightweight interrupt mechanism for the completion of memory copying, instead of using the polling pattern which consuming large CPU resource and (3) a group of well-defined and abstract interfaces, allowing the programmers to utilize the underlying free I/OAT channels transparently. To support the interfaces, a novel scheduler of the I/OAT channels is introduced. It splits the source copying data into several pieces, and each of them can be allocated with a dedicated I/OAT channel intelligently to transfer the data with parallelism. We evaluate RAMCI and compare it with other memory copying mechanisms in four NUMA scenarios. The experimental results show that RAMCI improves memory copying performance up to 4.68 × while achieving almost full ability of parallel computing.
AbstractList Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issue, some works based on I/OAT, which is a dedicated and popular hardware copying engine on Intel platform, is proposed but still exists several problems: (1) lacking atomic allocation/revocation at the granularity of I/OAT channel; (2) deficiency of interrupt support and (3) complicated programming interfaces. We propose RAMCI, an asynchronous (async) memory copying mechanism based on Intel I/OAT engine, not only improves the sync overheads, but also overcomes the above three issues through (1) a lock mechanism by using low-level CAS instruction; (2) a lightweight interrupt mechanism for the completion of memory copying, instead of using the polling pattern which consuming large CPU resource and (3) a group of well-defined and abstract interfaces, allowing the programmers to utilize the underlying free I/OAT channels transparently. To support the interfaces, a novel scheduler of the I/OAT channels is introduced. It splits the source copying data into several pieces, and each of them can be allocated with a dedicated I/OAT channel intelligently to transfer the data with parallelism. We evaluate RAMCI and compare it with other memory copying mechanisms in four NUMA scenarios. The experimental results show that RAMCI improves memory copying performance up to 4.68 × while achieving almost full ability of parallel computing.
Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issue, some works based on I/OAT, which is a dedicated and popular hardware copying engine on Intel platform, is proposed but still exists several problems: (1) lacking atomic allocation/revocation at the granularity of I/OAT channel; (2) deficiency of interrupt support and (3) complicated programming interfaces. We propose RAMCI, an asynchronous (async) memory copying mechanism based on Intel I/OAT engine, not only improves the sync overheads, but also overcomes the above three issues through (1) a lock mechanism by using low-level CAS instruction; (2) a lightweight interrupt mechanism for the completion of memory copying, instead of using the polling pattern which consuming large CPU resource and (3) a group of well-defined and abstract interfaces, allowing the programmers to utilize the underlying free I/OAT channels transparently. To support the interfaces, a novel scheduler of the I/OAT channels is introduced. It splits the source copying data into several pieces, and each of them can be allocated with a dedicated I/OAT channel intelligently to transfer the data with parallelism. We evaluate RAMCI and compare it with other memory copying mechanisms in four NUMA scenarios. The experimental results show that RAMCI improves memory copying performance up to 4.68× while achieving almost full ability of parallel computing.
Author Li, Dingding
Chen, Zhenke
Wang, Zhiwen
Tang, Yong
Liu, Hai
Author_xml – sequence: 1
  givenname: Zhenke
  surname: Chen
  fullname: Chen, Zhenke
  organization: School of Computer Science, South China Normal University
– sequence: 2
  givenname: Dingding
  orcidid: 0000-0001-9092-9814
  surname: Li
  fullname: Li, Dingding
  email: dingly@scnu.edu.cn
  organization: School of Computer Science, South China Normal University
– sequence: 3
  givenname: Zhiwen
  surname: Wang
  fullname: Wang, Zhiwen
  organization: School of Computer Science, South China Normal University
– sequence: 4
  givenname: Hai
  surname: Liu
  fullname: Liu, Hai
  organization: School of Computer Science, South China Normal University
– sequence: 5
  givenname: Yong
  surname: Tang
  fullname: Tang, Yong
  organization: School of Computer Science, South China Normal University
BookMark eNp9kN9LwzAQx4MoOOf-AZ8CPtddL23T-jaGPwaTgcznkDbJ1rEmM-mE_vdWKwo-7Onu4Pu5Oz5X5Nw6qwm5ieEuBuDTkGAaJxFgHAFAxqLujIwwxSRKCgbnvz3iJZmEsOtDyGNAzEZk_jp7mS_uqaTWfeg9laGz1dY7646BNrpxvqOVO3S13fRjtZW2Dg0tZdCKOksX09VsfU0ujNwHPfmpY_L2-LCeP0fL1dNiPltGFYuLNuJG87LMGE-5SpQuQEmjCm0YlgigkEljIIWizFMwBnmqDJcAeaKUwdyUbExuh70H796POrRi547e9icFFoxlmKd9GZN8SFXeheC1EVXdyrZ2tvWy3osYxJc1MVgTvTXxbU10PYr_0IOvG-m70xAboNCH7Ub7v69OUJ-HloDw
CitedBy_id crossref_primary_10_1109_TNSE_2022_3188657
crossref_primary_10_1109_TPDS_2024_3373003
crossref_primary_10_1016_j_sysarc_2022_102623
Cites_doi 10.1145/2540708.2540725
10.1109/JIOT.2020.2984332
10.1007/3-540-36108-1_18
10.1109/CLUSTR.2007.4629228
10.1109/TC.2007.1036
10.1109/JPROC.2019.2918951
10.1109/JIOT.2018.2868334
10.1109/FPL.2007.4380711
10.1109/CloudCom.2017.14
10.1007/s42514-019-00005-9
10.1109/TPDS.2016.2611659
10.1109/TSUSC.2019.2890841
10.1145/3234463
10.1145/1128022.1128023
10.1109/NAS.2011.15
10.1007/s42514-020-00041-w
10.1109/FPT.2006.270305
10.1109/PACT.2009.31
10.1007/s42514-020-00025-w
10.1145/2901318.2901350
10.1002/ett.4079
10.1145/224964.224988
10.1109/IPDPS.2007.370479
10.1007/s42514-020-00039-4
10.1109/TC.2010.41
10.1109/TPDS.2012.321
10.1109/TPDS.2015.2473166
10.1109/MNET.2015.7166189
10.1109/MWC.2018.1700315
ContentType Journal Article
Copyright China Computer Federation (CCF) 2021
China Computer Federation (CCF) 2021.
Copyright_xml – notice: China Computer Federation (CCF) 2021
– notice: China Computer Federation (CCF) 2021.
DBID AAYXX
CITATION
8FE
8FG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
DOI 10.1007/s42514-021-00063-y
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
DatabaseTitle CrossRef
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
Advanced Technologies & Aerospace Collection
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2524-4930
EndPage 143
ExternalDocumentID 10_1007_s42514_021_00063_y
GrantInformation_xml – fundername: Guangdong Basic and Applied Basic Research Foundation
  grantid: 2019A1515011160
– fundername: National Natural Science Foundation of China
  grantid: 61972164; 61772211
  funderid: http://dx.doi.org/10.13039/501100001809
– fundername: Pearl River S&T Nova Program of Guangzhou
  grantid: 201710010189
– fundername: Guangzhou Key Laboratory of Big Data and Intelligent Education
  grantid: 201905010009
GroupedDBID -EM
0R~
406
AACDK
AAHNG
AAJBT
AASML
AATNV
AAUYE
ABAKF
ABDZT
ABECU
ABFTV
ABJNI
ABKCH
ABMQK
ABTEG
ABTKH
ABTMW
ABXPI
ACAOD
ACDTI
ACHSB
ACMLO
ACOKC
ACPIV
ACZOJ
ADKNI
ADTPH
ADURQ
ADYFF
AEFQL
AEJRE
AEMSY
AESKC
AFBBN
AFKRA
AFQWF
AGDGC
AGJBK
AGMZJ
AGQEE
AGRTI
AIGIU
AILAN
AITGF
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMXSW
AMYLF
ARAPS
AXYYD
BENPR
BGLVJ
BGNMA
CCPQU
DPUIP
EBLON
EBS
EJD
FIGPU
FINBP
FNLPD
FSGXE
GGCAI
H13
HCIFZ
IKXTQ
IWAJR
J-C
JZLTJ
K7-
KOV
LLZTM
M4Y
NPVJJ
NQJWS
NU0
PT4
ROL
RSV
SJYHP
SNE
SNPRN
SOHCF
SOJ
SRMVM
SSLCW
STPWE
TSG
UOJIU
UTJUX
VEKWB
VFIZW
ZMTXR
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
AEZWR
AFDZB
AFFHD
AFHIU
AFOHR
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
PQGLB
8FE
8FG
AZQEC
DWQXO
GNUQQ
JQ2
P62
PKEHL
PQEST
PQQKQ
PQUKI
ID FETCH-LOGICAL-c319t-7fe7bb63757d4de90dafd9ef32b200d23aff0509b850ff275df7a0084ddf28fb3
IEDL.DBID K7-
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000670453400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2524-4922
IngestDate Sat Nov 08 15:45:18 EST 2025
Sat Nov 29 04:01:15 EST 2025
Tue Nov 18 22:28:58 EST 2025
Fri Feb 21 02:47:47 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords CPU
Memory Copying
I/OAT
NUMA
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c319t-7fe7bb63757d4de90dafd9ef32b200d23aff0509b850ff275df7a0084ddf28fb3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-9092-9814
PQID 2933628593
PQPubID 6587180
PageCount 15
ParticipantIDs proquest_journals_2933628593
crossref_citationtrail_10_1007_s42514_021_00063_y
crossref_primary_10_1007_s42514_021_00063_y
springer_journals_10_1007_s42514_021_00063_y
PublicationCentury 2000
PublicationDate 20210600
2021-06-00
20210601
PublicationDateYYYYMMDD 2021-06-01
PublicationDate_xml – month: 6
  year: 2021
  text: 20210600
PublicationDecade 2020
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
– name: Beijing
PublicationTitle CCF transactions on high performance computing (Online)
PublicationTitleAbbrev CCF Trans. HPC
PublicationYear 2021
Publisher Springer Singapore
Springer Nature B.V
Publisher_xml – name: Springer Singapore
– name: Springer Nature B.V
References DongMLiHOtaKXiaoJRule caching in sdn-enabled mobile access networksIEEE Netw.2015294404510.1109/MNET.2015.7166189
Harris, T.L., Fraser, K., Pratt, I.A.: A practical multi-word compare-and-swap operation. In: International Symposium on Distributed Computing, Springer, pp 265–279 (2002)
SunJChenHHeLTanHRedundant network traffic elimination with gpu accelerated rabin fingerprintingIEEE Trans. Parallel Distrib. Syst.20152772130214210.1109/TPDS.2015.2473166
Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: Proceedings of the 3rd conference on Computing frontiers, pp 1–8 (2006)
LiHOtaKDongMEccn: Orchestration of edge-centric computing and content-centric networking in the 5g radio access networkIEEE Wirel. Commun.2018253889310.1109/MWC.2018.1700315
LiDDongMYuanYChenJOtaKTangYSeer-mcache: A prefetchable memory object caching system for iot real-time data processingIEEE Internet Things J.2018553648366010.1109/JIOT.2018.2868334
Lepak, K., Talbot, G., White, S., Beck, N., Naffziger, S., et al. (2017) The next generation amd enterprise server product architecture. IEEE Hot Chips 29
Vaidyanathan, K., Huang, W., Chai, L., Panda, D.K.: Designing efficient asynchronous memory operations using hardware copy engine: A case study with i/oat. In: 2007 IEEE International Parallel and Distributed Processing Symposium, IEEE, pp 1–8 (2007b)
Vassiliadis ,S., Duarte, F., Wong, S.: A load/store unit for a memcpy hardware accelerator. In: 2007 International Conference on Field Programmable Logic and Applications, IEEE, pp 537–541 (2007)
Seshadri, V., Kim, Y., Fallin, C., Lee, D., Ausavarungnirun, R., Pekhimenko, G., Luo, Y., Mutlu, O., Gibbons, P.B., Kozuch, M.A, et al. Rowclone: fast and energy-efficient in-dram bulk data copy and initialization. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp 185–197 (2013)
Govindaraju, R.K., Cheng, L., Ranganathan, P., Marty, M.R., Gallatin, A.: Asynchronous copying of data within memory. US Patent 10,191,672 (2019)
ZhouZChenXLiEZengLLuoKZhangJEdge intelligence: Paving the last mile of artificial intelligence with edge computingProc. IEEE201910781738176210.1109/JPROC.2019.2918951
DuarteFWongSCache-based memory copy hardware accelerator for multicore systemsIEEE Trans. Comput.2010591114941507276729810.1109/TC.2010.41
Fang, J., Huang, C., Tang, T., Wang, Z.: Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Trans. High Perform. Comput. pp 1–19 (2020)
Valois, J.D.: Lock-free linked lists using compare-and-swap. In: Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, pp 214–222 (1995)
LiDLiaoXJinHZhouBZhangQA new disk i/o model of virtualized cloud environmentIEEE Trans. Parallel Distrib. Syst.20122461129113810.1109/TPDS.2012.321
Su, W., Wang, L., Su, M., Liu, S.: A processor-dma-based memory copy hardware accelerator. In: 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage, IEEE, pp 225–229 (2011)
Zhou, Z., Yang, S., Pu, L.J., Yu, S.: Cefl: Online admission control, data scheduling and accuracy tuning for cost-efficient federated learning across edge nodes. IEEE Internet Things J. (2020)
Intel (2014) Intel®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} Xeon®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} E7-2800, E7-4800, E7-8800 v2 Datasheet, Vol. 2, March 2014
Wong, S., Duarte, F., Vassiliadis, S.: A hardware cache memcpy accelerator. In: 2006 IEEE International Conference on Field Programmable Technology, IEEE, pp 141–148 (2006)
ZhaoLBhuyanLNIyerRMakineniSNewellDHardware support for accelerating data movement in server platformIEEE Trans. Comput.2007566740753241169110.1109/TC.2007.1036
LiHOtaKDongMDeep reinforcement scheduling for mobile crowdsensing in fog computingACM Trans. Internet Technol. (TOIT)201919211810.1145/3234463
Yang, Z., Harris, J.R., Walker, B., Verkamp, D., Liu, C., Chang, C., Cao, G., Stern, J., Verma, V., Paul, L.E.: Spdk: A development kit to build high performance storage applications. In: 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), IEEE, pp 154–161 (2017)
Jiang, X., Solihin, Y., Zhao, L., Iyer, R.: Architecture support for improving bulk memory copying and initialization performance. In: 2009 18th International Conference on Parallel Architectures and Compilation Techniques, IEEE, pp 169–180 (2009)
Zhao, L., Iyer, R., Makineni, S., Bhuyan, L., Newell, D.: Hardware support for bulk data movement in server platforms. In: 2005 International Conference on Computer Design, IEEE, pp 53–60 (2005)
Chen, W., Chen, Z., Li, D., Liu, H., Tang, Y.: Low-overhead inline deduplication for persistent memory. Transactions on Emerging Telecommunications Technologies p e4079 (2020b)
Li, D., Ota, K., Zhong, Y., Dong, M., Tang, Y., Qiu, J.: Towards high-efficient transaction commitment in a virtualized and sustainable rdbms. IEEE Trans. Sustain. Comput. (2019a). https://doi.org/10.1109/TSUSC.2019.2890841
Atlidakis, V., Andrus, J., Geambasu, R., Mitropoulos, D., Nieh, J.: Posix abstractions in modern operating systems: The old, the new, and the missing. In: Proceedings of the Eleventh European Conference on Computer Systems, pp 1–17 (2016)
Vaidyanathan, K., Chai, L., Huang, W., Panda, D.K.: Efficient asynchronous memory copy operations on multi-core systems and i/oat. In: 2007 IEEE International Conference on Cluster Computing, IEEE, pp 159–168 (2007a)
Chen, Q., Zheng, L., Liao, X., Jin, H., Wang, Q.: Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture. CCF Transactions on High Performance Computing pp 1–14 (2020a)
HuaYShiXJinHLiuWJiangYChenYHeLSoftware-defined qos for i/o in exascale computingCCF Trans. High Perform. Comput.201911495910.1007/s42514-019-00005-9
Kanter, D.: Intel’s sandy bridge microarchitecture (2010)
Huang, D., Lu, Y.: Improving the efficiency of hpc data movement on container-based virtual cluster. CCF Trans. High Perform. Comput. pp 1–14 (2020)
ZhongWSunJChenHXiaoJChenZChengCShiXOptimizing graph processing on gpusIEEE Trans. Parallel Distrib. Syst.20162841149116210.1109/TPDS.2016.2611659
63_CR1
63_CR2
63_CR24
63_CR3
63_CR21
63_CR22
63_CR27
63_CR6
63_CR28
63_CR7
63_CR25
63_CR8
63_CR26
63_CR9
63_CR29
H Li (63_CR19) 2018; 25
W Zhong (63_CR32) 2016; 28
Z Zhou (63_CR33) 2019; 107
63_CR12
63_CR34
63_CR13
Y Hua (63_CR10) 2019; 1
63_CR11
F Duarte (63_CR5) 2010; 59
63_CR14
63_CR15
63_CR18
J Sun (63_CR23) 2015; 27
M Dong (63_CR4) 2015; 29
D Li (63_CR17) 2018; 5
D Li (63_CR16) 2012; 24
H Li (63_CR20) 2019; 19
63_CR30
L Zhao (63_CR31) 2007; 56
References_xml – reference: LiHOtaKDongMEccn: Orchestration of edge-centric computing and content-centric networking in the 5g radio access networkIEEE Wirel. Commun.2018253889310.1109/MWC.2018.1700315
– reference: Harris, T.L., Fraser, K., Pratt, I.A.: A practical multi-word compare-and-swap operation. In: International Symposium on Distributed Computing, Springer, pp 265–279 (2002)
– reference: SunJChenHHeLTanHRedundant network traffic elimination with gpu accelerated rabin fingerprintingIEEE Trans. Parallel Distrib. Syst.20152772130214210.1109/TPDS.2015.2473166
– reference: DongMLiHOtaKXiaoJRule caching in sdn-enabled mobile access networksIEEE Netw.2015294404510.1109/MNET.2015.7166189
– reference: Seshadri, V., Kim, Y., Fallin, C., Lee, D., Ausavarungnirun, R., Pekhimenko, G., Luo, Y., Mutlu, O., Gibbons, P.B., Kozuch, M.A, et al. Rowclone: fast and energy-efficient in-dram bulk data copy and initialization. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp 185–197 (2013)
– reference: Chen, Q., Zheng, L., Liao, X., Jin, H., Wang, Q.: Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture. CCF Transactions on High Performance Computing pp 1–14 (2020a)
– reference: Huang, D., Lu, Y.: Improving the efficiency of hpc data movement on container-based virtual cluster. CCF Trans. High Perform. Comput. pp 1–14 (2020)
– reference: ZhouZChenXLiEZengLLuoKZhangJEdge intelligence: Paving the last mile of artificial intelligence with edge computingProc. IEEE201910781738176210.1109/JPROC.2019.2918951
– reference: Lepak, K., Talbot, G., White, S., Beck, N., Naffziger, S., et al. (2017) The next generation amd enterprise server product architecture. IEEE Hot Chips 29
– reference: Wong, S., Duarte, F., Vassiliadis, S.: A hardware cache memcpy accelerator. In: 2006 IEEE International Conference on Field Programmable Technology, IEEE, pp 141–148 (2006)
– reference: Intel (2014) Intel®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} Xeon®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} E7-2800, E7-4800, E7-8800 v2 Datasheet, Vol. 2, March 2014
– reference: Vaidyanathan, K., Huang, W., Chai, L., Panda, D.K.: Designing efficient asynchronous memory operations using hardware copy engine: A case study with i/oat. In: 2007 IEEE International Parallel and Distributed Processing Symposium, IEEE, pp 1–8 (2007b)
– reference: Vaidyanathan, K., Chai, L., Huang, W., Panda, D.K.: Efficient asynchronous memory copy operations on multi-core systems and i/oat. In: 2007 IEEE International Conference on Cluster Computing, IEEE, pp 159–168 (2007a)
– reference: Kanter, D.: Intel’s sandy bridge microarchitecture (2010)
– reference: ZhongWSunJChenHXiaoJChenZChengCShiXOptimizing graph processing on gpusIEEE Trans. Parallel Distrib. Syst.20162841149116210.1109/TPDS.2016.2611659
– reference: LiDLiaoXJinHZhouBZhangQA new disk i/o model of virtualized cloud environmentIEEE Trans. Parallel Distrib. Syst.20122461129113810.1109/TPDS.2012.321
– reference: LiDDongMYuanYChenJOtaKTangYSeer-mcache: A prefetchable memory object caching system for iot real-time data processingIEEE Internet Things J.2018553648366010.1109/JIOT.2018.2868334
– reference: Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: Proceedings of the 3rd conference on Computing frontiers, pp 1–8 (2006)
– reference: Atlidakis, V., Andrus, J., Geambasu, R., Mitropoulos, D., Nieh, J.: Posix abstractions in modern operating systems: The old, the new, and the missing. In: Proceedings of the Eleventh European Conference on Computer Systems, pp 1–17 (2016)
– reference: Su, W., Wang, L., Su, M., Liu, S.: A processor-dma-based memory copy hardware accelerator. In: 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage, IEEE, pp 225–229 (2011)
– reference: Chen, W., Chen, Z., Li, D., Liu, H., Tang, Y.: Low-overhead inline deduplication for persistent memory. Transactions on Emerging Telecommunications Technologies p e4079 (2020b)
– reference: LiHOtaKDongMDeep reinforcement scheduling for mobile crowdsensing in fog computingACM Trans. Internet Technol. (TOIT)201919211810.1145/3234463
– reference: Yang, Z., Harris, J.R., Walker, B., Verkamp, D., Liu, C., Chang, C., Cao, G., Stern, J., Verma, V., Paul, L.E.: Spdk: A development kit to build high performance storage applications. In: 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), IEEE, pp 154–161 (2017)
– reference: Valois, J.D.: Lock-free linked lists using compare-and-swap. In: Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, pp 214–222 (1995)
– reference: Zhao, L., Iyer, R., Makineni, S., Bhuyan, L., Newell, D.: Hardware support for bulk data movement in server platforms. In: 2005 International Conference on Computer Design, IEEE, pp 53–60 (2005)
– reference: Fang, J., Huang, C., Tang, T., Wang, Z.: Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Trans. High Perform. Comput. pp 1–19 (2020)
– reference: Vassiliadis ,S., Duarte, F., Wong, S.: A load/store unit for a memcpy hardware accelerator. In: 2007 International Conference on Field Programmable Logic and Applications, IEEE, pp 537–541 (2007)
– reference: Li, D., Ota, K., Zhong, Y., Dong, M., Tang, Y., Qiu, J.: Towards high-efficient transaction commitment in a virtualized and sustainable rdbms. IEEE Trans. Sustain. Comput. (2019a). https://doi.org/10.1109/TSUSC.2019.2890841
– reference: DuarteFWongSCache-based memory copy hardware accelerator for multicore systemsIEEE Trans. Comput.2010591114941507276729810.1109/TC.2010.41
– reference: HuaYShiXJinHLiuWJiangYChenYHeLSoftware-defined qos for i/o in exascale computingCCF Trans. High Perform. Comput.201911495910.1007/s42514-019-00005-9
– reference: ZhaoLBhuyanLNIyerRMakineniSNewellDHardware support for accelerating data movement in server platformIEEE Trans. Comput.2007566740753241169110.1109/TC.2007.1036
– reference: Jiang, X., Solihin, Y., Zhao, L., Iyer, R.: Architecture support for improving bulk memory copying and initialization performance. In: 2009 18th International Conference on Parallel Architectures and Compilation Techniques, IEEE, pp 169–180 (2009)
– reference: Zhou, Z., Yang, S., Pu, L.J., Yu, S.: Cefl: Online admission control, data scheduling and accuracy tuning for cost-efficient federated learning across edge nodes. IEEE Internet Things J. (2020)
– reference: Govindaraju, R.K., Cheng, L., Ranganathan, P., Marty, M.R., Gallatin, A.: Asynchronous copying of data within memory. US Patent 10,191,672 (2019)
– ident: 63_CR21
  doi: 10.1145/2540708.2540725
– ident: 63_CR34
  doi: 10.1109/JIOT.2020.2984332
– ident: 63_CR9
  doi: 10.1007/3-540-36108-1_18
– ident: 63_CR24
  doi: 10.1109/CLUSTR.2007.4629228
– volume: 56
  start-page: 740
  issue: 6
  year: 2007
  ident: 63_CR31
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2007.1036
– volume: 107
  start-page: 1738
  issue: 8
  year: 2019
  ident: 63_CR33
  publication-title: Proc. IEEE
  doi: 10.1109/JPROC.2019.2918951
– volume: 5
  start-page: 3648
  issue: 5
  year: 2018
  ident: 63_CR17
  publication-title: IEEE Internet Things J.
  doi: 10.1109/JIOT.2018.2868334
– ident: 63_CR27
  doi: 10.1109/FPL.2007.4380711
– ident: 63_CR30
– ident: 63_CR15
– ident: 63_CR29
  doi: 10.1109/CloudCom.2017.14
– volume: 1
  start-page: 49
  issue: 1
  year: 2019
  ident: 63_CR10
  publication-title: CCF Trans. High Perform. Comput.
  doi: 10.1007/s42514-019-00005-9
– volume: 28
  start-page: 1149
  issue: 4
  year: 2016
  ident: 63_CR32
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2016.2611659
– ident: 63_CR18
  doi: 10.1109/TSUSC.2019.2890841
– volume: 19
  start-page: 1
  issue: 2
  year: 2019
  ident: 63_CR20
  publication-title: ACM Trans. Internet Technol. (TOIT)
  doi: 10.1145/3234463
– ident: 63_CR8
  doi: 10.1145/1128022.1128023
– ident: 63_CR22
  doi: 10.1109/NAS.2011.15
– ident: 63_CR2
  doi: 10.1007/s42514-020-00041-w
– ident: 63_CR28
  doi: 10.1109/FPT.2006.270305
– ident: 63_CR7
– ident: 63_CR13
  doi: 10.1109/PACT.2009.31
– ident: 63_CR11
  doi: 10.1007/s42514-020-00025-w
– ident: 63_CR1
  doi: 10.1145/2901318.2901350
– ident: 63_CR3
  doi: 10.1002/ett.4079
– ident: 63_CR26
  doi: 10.1145/224964.224988
– ident: 63_CR12
– ident: 63_CR14
– ident: 63_CR25
  doi: 10.1109/IPDPS.2007.370479
– ident: 63_CR6
  doi: 10.1007/s42514-020-00039-4
– volume: 59
  start-page: 1494
  issue: 11
  year: 2010
  ident: 63_CR5
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2010.41
– volume: 24
  start-page: 1129
  issue: 6
  year: 2012
  ident: 63_CR16
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2012.321
– volume: 27
  start-page: 2130
  issue: 7
  year: 2015
  ident: 63_CR23
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2015.2473166
– volume: 29
  start-page: 40
  issue: 4
  year: 2015
  ident: 63_CR4
  publication-title: IEEE Netw.
  doi: 10.1109/MNET.2015.7166189
– volume: 25
  start-page: 88
  issue: 3
  year: 2018
  ident: 63_CR19
  publication-title: IEEE Wirel. Commun.
  doi: 10.1109/MWC.2018.1700315
SSID ssj0002710226
ssib053822361
Score 2.2076266
Snippet Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying,...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 129
SubjectTerms 3. High Performance Distributed Computing
Central processing units
Channels
Computer Hardware
Computer memory
Computer Science
Computer Systems Organization and Communication Networks
Copying
CPUs
Regular Paper
Software
Stalling
SummonAdditionalLinks – databaseName: Springer Nature - Connect here FIRST to enable access
  dbid: RSV
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA86ffDF-YnTKXnwTYNd2iatb2M4HOiUOcfeStokMNjasW6D_vcmWbOhqKCPoZdQ7iN37d39DoBrJijHlBBEOcXIY5KgOEgEwoQ5jAhGVhndwRPtdoPhMHwtm8JyW-1uU5Lmpl43uyntanhIlxQYx4qKbbCj3F2gBzb03gZWi5QFY4soYu5jrJ2ombuGfewhL8S47J75_tjPHmoTdn7JlBoH1K7-79UPwH4ZcMLmSkMOwZZIj0DVDnOApW0fg1av-dzq3EMG02wpxpDlRZpo5NxskcOJrsctYJJNdVeUWup-4VE-gdoJcpilsHP30uyfgPf2Q7_1iMoJCyhRpjdHVAoax8SlPuUeF6HDmeShkC6OlfVw7DIpNUBMHPiOlJj6XFKmIfg5lziQsXsKKmmWijMAmUtU5BGrj9tG7IWKLKFqETK1lQecODXQsFyNkhJ-XE_BGEdr4GTDpUhxKTJciooauFnvma7AN36lrlthRaUh5pGKZlzdJRq6NXBrhbN5_PNp538jvwB72MhX_5-pg8p8thCXYDdZzkf57Moo6AcLLtyJ
  priority: 102
  providerName: Springer Nature
Title RAMCI: a novel asynchronous memory copying mechanism based on I/OAT
URI https://link.springer.com/article/10.1007/s42514-021-00063-y
https://www.proquest.com/docview/2933628593
Volume 3
WOSCitedRecordID wos000670453400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 2524-4930
  dateEnd: 20241212
  omitProxy: false
  ssIdentifier: ssj0002710226
  issn: 2524-4922
  databaseCode: K7-
  dateStart: 20190501
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 2524-4930
  dateEnd: 20241212
  omitProxy: false
  ssIdentifier: ssj0002710226
  issn: 2524-4922
  databaseCode: BENPR
  dateStart: 20190501
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 2524-4930
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002710226
  issn: 2524-4922
  databaseCode: RSV
  dateStart: 20190501
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NT8IwFG8UPHjxI2pEkfTgTRtGt7WbF4MEIokiQSTelm5tExLYkAHJ_nvbskE0kYuXJc3aJuv7XN97vwfALROUY0oIopxi5DBJUOhFAmHCLEYEI-uI7uiF9nre56ffzy_c0jytstCJRlHzJNJ35HVllmxd7ufbj7MvpLtG6ehq3kJjH5QbWClhHZSlqOAnJcu4wBYxmhlrc2o6sGEXO8jxMc7raEw1nWLfhoN0zoKx3Cj7aau2DuivmKkxRZ3j_37ECTjKnVDYXHPNKdgT8RloDZqvre4DZDBOVmICWZrFkYbOTZYpnOqE3AxGyUyXRamhLhgep1OorSCHSQy79bfm8Bx8dNrD1jPKWyygSMneAlEpaBgSm7qUO1z4FmeS-0LaOFTiw7HNpNQIMaHnWlJi6nJJmcbg51xiT4b2BSjFSSwuAWQ2Ua5HqP5uG6Hjq2kRVQOfqaXc48SqgEZxmEGU44_rNhiTYIOcbAgQKAIEhgBBVgF3mzWzNfrGztnV4tSDXBLTYHvkFXBf0G37-u_drnbvdg0OsWEVfSFTBaXFfCluwEG0WozTeQ2Un9q9_qBm-FE9B--jb6RM4hg
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT-MwEB6xgLRcFhAgytMHOLEWrZPaCdIKVQVE1VIQ6iJuwYltCQmSQgqr_Cl-43rchAokuHHgaPmhKN83Dz9mBmBHaqGY4JwKJRj1peE0DhJNGZd1ybXk4xvdq57o94Pr6_BiCl6qWBh8VlnpRKeoVZbgGfm-NUsehvuF3uHwgWLVKLxdrUpojGnR1cU_u2XL_3SOLL67jJ0cD9qntKwqQBNLtxEVRos45p5oCuUrHdaVNCrUxmOxZYxinjQGk6LEQbNuDBNNZYTEtPNKGRaY2LPr_oAZ3wsEylVX0Iq_VnewKpeJswQMzber-MaazKd-yFgZt-Oi96y4NHyKbyScp0CLt7Zx4vC-u6N1pu9k_rv9tAX4VTrZpDWWikWY0ukStC9bZ-3OAZEkzZ71HZF5kSaYGjh7ysk9PjguSJINMezLNjEg-ja_J2jlFclS0tk_bw2W4e-XfPcKTKdZqleBSI9b1yq2u_dG7Id2WCJsI5R2qgoUr9egUYEXJWV-dSzzcRe9ZoZ2gEcW8MgBHhU12HudMxxnF_l09EaFclRqmjyaQFyD3xVPJt0fr7b2-Wrb8PN0cNaLep1-dx3mmKMpHj5twPTo8UlvwmzyPLrNH7ecDBC4-Wr-_AfI2T79
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwED_8QnzxW5xOzYNvGralXdL6NqbD4ZxD5_CtpE0Cg9kOuw3235tk7aaigvgYmgR6uetdevf7HcA5l0wQRilmghHsckVx6EUSE8rLnEpOZxndXou1297Li9_5gOK31e55SnKGaTAsTfGoNBSqNAe-aU2ruNiUF1gni6fLsOqaQnpzX3_q5RqlrZnk7CL220yMQ7U92EiVuNj1CcmQNN9v-9lbLULQL1lT64waW_9_jW3YzAJRVJtpzg4syXgXtvImDyiz-T2oP9bu680rxFGcTOQA8XQaR4ZRNxmn6NXU6U5RlAwNWkoPDY64n74i4xwFSmLULD3Uuvvw3Ljp1m9x1nkBR9okR5gpycKQOqzKhCukXxZcCV8qh4TaqgRxuFKGOCb0qmWlCKsKxbih5hdCEU-FzgGsxEksDwFxh-qIJNSX3kro-npaxPTA53qp8AQtF6CSSziIMlpy0x1jEMwJla2UAi2lwEopmBbgYr5mOCPl-HV2MT-4IDPQNNBRjmPQo75TgMv8oBaPf97t6G_Tz2C9c90IWs323TFsEHvU5hdOEVZGb2N5AmvRZNRP306t3r4D8GzoUQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=RAMCI%3A+a+novel+asynchronous+memory+copying+mechanism+based+on+I%2FOAT&rft.jtitle=CCF+transactions+on+high+performance+computing+%28Online%29&rft.au=Chen%2C+Zhenke&rft.au=Li%2C+Dingding&rft.au=Wang%2C+Zhiwen&rft.au=Liu%2C+Hai&rft.date=2021-06-01&rft.pub=Springer+Nature+B.V&rft.issn=2524-4922&rft.eissn=2524-4930&rft.volume=3&rft.issue=2&rft.spage=129&rft.epage=143&rft_id=info:doi/10.1007%2Fs42514-021-00063-y
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2524-4922&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2524-4922&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2524-4922&client=summon