RAMCI: a novel asynchronous memory copying mechanism based on I/OAT
Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issu...
Uloženo v:
| Vydáno v: | CCF transactions on high performance computing (Online) Ročník 3; číslo 2; s. 129 - 143 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Singapore
Springer Singapore
01.06.2021
Springer Nature B.V |
| Témata: | |
| ISSN: | 2524-4922, 2524-4930 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issue, some works based on I/OAT, which is a dedicated and popular hardware copying engine on Intel platform, is proposed but still exists several problems: (1) lacking atomic allocation/revocation at the granularity of I/OAT channel; (2) deficiency of interrupt support and (3) complicated programming interfaces. We propose RAMCI, an asynchronous (async) memory copying mechanism based on Intel I/OAT engine, not only improves the sync overheads, but also overcomes the above three issues through (1) a lock mechanism by using low-level CAS instruction; (2) a lightweight interrupt mechanism for the completion of memory copying, instead of using the polling pattern which consuming large CPU resource and (3) a group of well-defined and abstract interfaces, allowing the programmers to utilize the underlying free I/OAT channels transparently. To support the interfaces, a novel scheduler of the I/OAT channels is introduced. It splits the source copying data into several pieces, and each of them can be allocated with a dedicated I/OAT channel intelligently to transfer the data with parallelism. We evaluate RAMCI and compare it with other memory copying mechanisms in four NUMA scenarios. The experimental results show that RAMCI improves memory copying performance up to 4.68
×
while achieving almost full ability of parallel computing. |
|---|---|
| AbstractList | Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issue, some works based on I/OAT, which is a dedicated and popular hardware copying engine on Intel platform, is proposed but still exists several problems: (1) lacking atomic allocation/revocation at the granularity of I/OAT channel; (2) deficiency of interrupt support and (3) complicated programming interfaces. We propose RAMCI, an asynchronous (async) memory copying mechanism based on Intel I/OAT engine, not only improves the sync overheads, but also overcomes the above three issues through (1) a lock mechanism by using low-level CAS instruction; (2) a lightweight interrupt mechanism for the completion of memory copying, instead of using the polling pattern which consuming large CPU resource and (3) a group of well-defined and abstract interfaces, allowing the programmers to utilize the underlying free I/OAT channels transparently. To support the interfaces, a novel scheduler of the I/OAT channels is introduced. It splits the source copying data into several pieces, and each of them can be allocated with a dedicated I/OAT channel intelligently to transfer the data with parallelism. We evaluate RAMCI and compare it with other memory copying mechanisms in four NUMA scenarios. The experimental results show that RAMCI improves memory copying performance up to 4.68
×
while achieving almost full ability of parallel computing. Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying, incurring overheads such as cache pollution and CPU stalling, especially in the scenario of bulk copying with large data. To improve this issue, some works based on I/OAT, which is a dedicated and popular hardware copying engine on Intel platform, is proposed but still exists several problems: (1) lacking atomic allocation/revocation at the granularity of I/OAT channel; (2) deficiency of interrupt support and (3) complicated programming interfaces. We propose RAMCI, an asynchronous (async) memory copying mechanism based on Intel I/OAT engine, not only improves the sync overheads, but also overcomes the above three issues through (1) a lock mechanism by using low-level CAS instruction; (2) a lightweight interrupt mechanism for the completion of memory copying, instead of using the polling pattern which consuming large CPU resource and (3) a group of well-defined and abstract interfaces, allowing the programmers to utilize the underlying free I/OAT channels transparently. To support the interfaces, a novel scheduler of the I/OAT channels is introduced. It splits the source copying data into several pieces, and each of them can be allocated with a dedicated I/OAT channel intelligently to transfer the data with parallelism. We evaluate RAMCI and compare it with other memory copying mechanisms in four NUMA scenarios. The experimental results show that RAMCI improves memory copying performance up to 4.68× while achieving almost full ability of parallel computing. |
| Author | Li, Dingding Chen, Zhenke Wang, Zhiwen Tang, Yong Liu, Hai |
| Author_xml | – sequence: 1 givenname: Zhenke surname: Chen fullname: Chen, Zhenke organization: School of Computer Science, South China Normal University – sequence: 2 givenname: Dingding orcidid: 0000-0001-9092-9814 surname: Li fullname: Li, Dingding email: dingly@scnu.edu.cn organization: School of Computer Science, South China Normal University – sequence: 3 givenname: Zhiwen surname: Wang fullname: Wang, Zhiwen organization: School of Computer Science, South China Normal University – sequence: 4 givenname: Hai surname: Liu fullname: Liu, Hai organization: School of Computer Science, South China Normal University – sequence: 5 givenname: Yong surname: Tang fullname: Tang, Yong organization: School of Computer Science, South China Normal University |
| BookMark | eNp9kN9LwzAQx4MoOOf-AZ8CPtddL23T-jaGPwaTgcznkDbJ1rEmM-mE_vdWKwo-7Onu4Pu5Oz5X5Nw6qwm5ieEuBuDTkGAaJxFgHAFAxqLujIwwxSRKCgbnvz3iJZmEsOtDyGNAzEZk_jp7mS_uqaTWfeg9laGz1dY7646BNrpxvqOVO3S13fRjtZW2Dg0tZdCKOksX09VsfU0ujNwHPfmpY_L2-LCeP0fL1dNiPltGFYuLNuJG87LMGE-5SpQuQEmjCm0YlgigkEljIIWizFMwBnmqDJcAeaKUwdyUbExuh70H796POrRi547e9icFFoxlmKd9GZN8SFXeheC1EVXdyrZ2tvWy3osYxJc1MVgTvTXxbU10PYr_0IOvG-m70xAboNCH7Ub7v69OUJ-HloDw |
| CitedBy_id | crossref_primary_10_1109_TNSE_2022_3188657 crossref_primary_10_1109_TPDS_2024_3373003 crossref_primary_10_1016_j_sysarc_2022_102623 |
| Cites_doi | 10.1145/2540708.2540725 10.1109/JIOT.2020.2984332 10.1007/3-540-36108-1_18 10.1109/CLUSTR.2007.4629228 10.1109/TC.2007.1036 10.1109/JPROC.2019.2918951 10.1109/JIOT.2018.2868334 10.1109/FPL.2007.4380711 10.1109/CloudCom.2017.14 10.1007/s42514-019-00005-9 10.1109/TPDS.2016.2611659 10.1109/TSUSC.2019.2890841 10.1145/3234463 10.1145/1128022.1128023 10.1109/NAS.2011.15 10.1007/s42514-020-00041-w 10.1109/FPT.2006.270305 10.1109/PACT.2009.31 10.1007/s42514-020-00025-w 10.1145/2901318.2901350 10.1002/ett.4079 10.1145/224964.224988 10.1109/IPDPS.2007.370479 10.1007/s42514-020-00039-4 10.1109/TC.2010.41 10.1109/TPDS.2012.321 10.1109/TPDS.2015.2473166 10.1109/MNET.2015.7166189 10.1109/MWC.2018.1700315 |
| ContentType | Journal Article |
| Copyright | China Computer Federation (CCF) 2021 China Computer Federation (CCF) 2021. |
| Copyright_xml | – notice: China Computer Federation (CCF) 2021 – notice: China Computer Federation (CCF) 2021. |
| DBID | AAYXX CITATION 8FE 8FG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI |
| DOI | 10.1007/s42514-021-00063-y |
| DatabaseName | CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition |
| DatabaseTitle | CrossRef Advanced Technologies & Aerospace Collection Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest One Academic Eastern Edition SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Advanced Technologies & Aerospace Collection |
| Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2524-4930 |
| EndPage | 143 |
| ExternalDocumentID | 10_1007_s42514_021_00063_y |
| GrantInformation_xml | – fundername: Guangdong Basic and Applied Basic Research Foundation grantid: 2019A1515011160 – fundername: National Natural Science Foundation of China grantid: 61972164; 61772211 funderid: http://dx.doi.org/10.13039/501100001809 – fundername: Pearl River S&T Nova Program of Guangzhou grantid: 201710010189 – fundername: Guangzhou Key Laboratory of Big Data and Intelligent Education grantid: 201905010009 |
| GroupedDBID | -EM 0R~ 406 AACDK AAHNG AAJBT AASML AATNV AAUYE ABAKF ABDZT ABECU ABFTV ABJNI ABKCH ABMQK ABTEG ABTKH ABTMW ABXPI ACAOD ACDTI ACHSB ACMLO ACOKC ACPIV ACZOJ ADKNI ADTPH ADURQ ADYFF AEFQL AEJRE AEMSY AESKC AFBBN AFKRA AFQWF AGDGC AGJBK AGMZJ AGQEE AGRTI AIGIU AILAN AITGF AJZVZ ALMA_UNASSIGNED_HOLDINGS AMKLP AMXSW AMYLF ARAPS AXYYD BENPR BGLVJ BGNMA CCPQU DPUIP EBLON EBS EJD FIGPU FINBP FNLPD FSGXE GGCAI H13 HCIFZ IKXTQ IWAJR J-C JZLTJ K7- KOV LLZTM M4Y NPVJJ NQJWS NU0 PT4 ROL RSV SJYHP SNE SNPRN SOHCF SOJ SRMVM SSLCW STPWE TSG UOJIU UTJUX VEKWB VFIZW ZMTXR AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC AEZWR AFDZB AFFHD AFHIU AFOHR AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT PQGLB 8FE 8FG AZQEC DWQXO GNUQQ JQ2 P62 PKEHL PQEST PQQKQ PQUKI |
| ID | FETCH-LOGICAL-c319t-7fe7bb63757d4de90dafd9ef32b200d23aff0509b850ff275df7a0084ddf28fb3 |
| IEDL.DBID | K7- |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000670453400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2524-4922 |
| IngestDate | Sat Nov 08 15:45:18 EST 2025 Sat Nov 29 04:01:15 EST 2025 Tue Nov 18 22:28:58 EST 2025 Fri Feb 21 02:47:47 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | CPU Memory Copying I/OAT NUMA |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c319t-7fe7bb63757d4de90dafd9ef32b200d23aff0509b850ff275df7a0084ddf28fb3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-9092-9814 |
| PQID | 2933628593 |
| PQPubID | 6587180 |
| PageCount | 15 |
| ParticipantIDs | proquest_journals_2933628593 crossref_citationtrail_10_1007_s42514_021_00063_y crossref_primary_10_1007_s42514_021_00063_y springer_journals_10_1007_s42514_021_00063_y |
| PublicationCentury | 2000 |
| PublicationDate | 20210600 2021-06-00 20210601 |
| PublicationDateYYYYMMDD | 2021-06-01 |
| PublicationDate_xml | – month: 6 year: 2021 text: 20210600 |
| PublicationDecade | 2020 |
| PublicationPlace | Singapore |
| PublicationPlace_xml | – name: Singapore – name: Beijing |
| PublicationTitle | CCF transactions on high performance computing (Online) |
| PublicationTitleAbbrev | CCF Trans. HPC |
| PublicationYear | 2021 |
| Publisher | Springer Singapore Springer Nature B.V |
| Publisher_xml | – name: Springer Singapore – name: Springer Nature B.V |
| References | DongMLiHOtaKXiaoJRule caching in sdn-enabled mobile access networksIEEE Netw.2015294404510.1109/MNET.2015.7166189 Harris, T.L., Fraser, K., Pratt, I.A.: A practical multi-word compare-and-swap operation. In: International Symposium on Distributed Computing, Springer, pp 265–279 (2002) SunJChenHHeLTanHRedundant network traffic elimination with gpu accelerated rabin fingerprintingIEEE Trans. Parallel Distrib. Syst.20152772130214210.1109/TPDS.2015.2473166 Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: Proceedings of the 3rd conference on Computing frontiers, pp 1–8 (2006) LiHOtaKDongMEccn: Orchestration of edge-centric computing and content-centric networking in the 5g radio access networkIEEE Wirel. Commun.2018253889310.1109/MWC.2018.1700315 LiDDongMYuanYChenJOtaKTangYSeer-mcache: A prefetchable memory object caching system for iot real-time data processingIEEE Internet Things J.2018553648366010.1109/JIOT.2018.2868334 Lepak, K., Talbot, G., White, S., Beck, N., Naffziger, S., et al. (2017) The next generation amd enterprise server product architecture. IEEE Hot Chips 29 Vaidyanathan, K., Huang, W., Chai, L., Panda, D.K.: Designing efficient asynchronous memory operations using hardware copy engine: A case study with i/oat. In: 2007 IEEE International Parallel and Distributed Processing Symposium, IEEE, pp 1–8 (2007b) Vassiliadis ,S., Duarte, F., Wong, S.: A load/store unit for a memcpy hardware accelerator. In: 2007 International Conference on Field Programmable Logic and Applications, IEEE, pp 537–541 (2007) Seshadri, V., Kim, Y., Fallin, C., Lee, D., Ausavarungnirun, R., Pekhimenko, G., Luo, Y., Mutlu, O., Gibbons, P.B., Kozuch, M.A, et al. Rowclone: fast and energy-efficient in-dram bulk data copy and initialization. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp 185–197 (2013) Govindaraju, R.K., Cheng, L., Ranganathan, P., Marty, M.R., Gallatin, A.: Asynchronous copying of data within memory. US Patent 10,191,672 (2019) ZhouZChenXLiEZengLLuoKZhangJEdge intelligence: Paving the last mile of artificial intelligence with edge computingProc. IEEE201910781738176210.1109/JPROC.2019.2918951 DuarteFWongSCache-based memory copy hardware accelerator for multicore systemsIEEE Trans. Comput.2010591114941507276729810.1109/TC.2010.41 Fang, J., Huang, C., Tang, T., Wang, Z.: Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Trans. High Perform. Comput. pp 1–19 (2020) Valois, J.D.: Lock-free linked lists using compare-and-swap. In: Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, pp 214–222 (1995) LiDLiaoXJinHZhouBZhangQA new disk i/o model of virtualized cloud environmentIEEE Trans. Parallel Distrib. Syst.20122461129113810.1109/TPDS.2012.321 Su, W., Wang, L., Su, M., Liu, S.: A processor-dma-based memory copy hardware accelerator. In: 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage, IEEE, pp 225–229 (2011) Zhou, Z., Yang, S., Pu, L.J., Yu, S.: Cefl: Online admission control, data scheduling and accuracy tuning for cost-efficient federated learning across edge nodes. IEEE Internet Things J. (2020) Intel (2014) Intel®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} Xeon®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} E7-2800, E7-4800, E7-8800 v2 Datasheet, Vol. 2, March 2014 Wong, S., Duarte, F., Vassiliadis, S.: A hardware cache memcpy accelerator. In: 2006 IEEE International Conference on Field Programmable Technology, IEEE, pp 141–148 (2006) ZhaoLBhuyanLNIyerRMakineniSNewellDHardware support for accelerating data movement in server platformIEEE Trans. Comput.2007566740753241169110.1109/TC.2007.1036 LiHOtaKDongMDeep reinforcement scheduling for mobile crowdsensing in fog computingACM Trans. Internet Technol. (TOIT)201919211810.1145/3234463 Yang, Z., Harris, J.R., Walker, B., Verkamp, D., Liu, C., Chang, C., Cao, G., Stern, J., Verma, V., Paul, L.E.: Spdk: A development kit to build high performance storage applications. In: 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), IEEE, pp 154–161 (2017) Jiang, X., Solihin, Y., Zhao, L., Iyer, R.: Architecture support for improving bulk memory copying and initialization performance. In: 2009 18th International Conference on Parallel Architectures and Compilation Techniques, IEEE, pp 169–180 (2009) Zhao, L., Iyer, R., Makineni, S., Bhuyan, L., Newell, D.: Hardware support for bulk data movement in server platforms. In: 2005 International Conference on Computer Design, IEEE, pp 53–60 (2005) Chen, W., Chen, Z., Li, D., Liu, H., Tang, Y.: Low-overhead inline deduplication for persistent memory. Transactions on Emerging Telecommunications Technologies p e4079 (2020b) Li, D., Ota, K., Zhong, Y., Dong, M., Tang, Y., Qiu, J.: Towards high-efficient transaction commitment in a virtualized and sustainable rdbms. IEEE Trans. Sustain. Comput. (2019a). https://doi.org/10.1109/TSUSC.2019.2890841 Atlidakis, V., Andrus, J., Geambasu, R., Mitropoulos, D., Nieh, J.: Posix abstractions in modern operating systems: The old, the new, and the missing. In: Proceedings of the Eleventh European Conference on Computer Systems, pp 1–17 (2016) Vaidyanathan, K., Chai, L., Huang, W., Panda, D.K.: Efficient asynchronous memory copy operations on multi-core systems and i/oat. In: 2007 IEEE International Conference on Cluster Computing, IEEE, pp 159–168 (2007a) Chen, Q., Zheng, L., Liao, X., Jin, H., Wang, Q.: Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture. CCF Transactions on High Performance Computing pp 1–14 (2020a) HuaYShiXJinHLiuWJiangYChenYHeLSoftware-defined qos for i/o in exascale computingCCF Trans. High Perform. Comput.201911495910.1007/s42514-019-00005-9 Kanter, D.: Intel’s sandy bridge microarchitecture (2010) Huang, D., Lu, Y.: Improving the efficiency of hpc data movement on container-based virtual cluster. CCF Trans. High Perform. Comput. pp 1–14 (2020) ZhongWSunJChenHXiaoJChenZChengCShiXOptimizing graph processing on gpusIEEE Trans. Parallel Distrib. Syst.20162841149116210.1109/TPDS.2016.2611659 63_CR1 63_CR2 63_CR24 63_CR3 63_CR21 63_CR22 63_CR27 63_CR6 63_CR28 63_CR7 63_CR25 63_CR8 63_CR26 63_CR9 63_CR29 H Li (63_CR19) 2018; 25 W Zhong (63_CR32) 2016; 28 Z Zhou (63_CR33) 2019; 107 63_CR12 63_CR34 63_CR13 Y Hua (63_CR10) 2019; 1 63_CR11 F Duarte (63_CR5) 2010; 59 63_CR14 63_CR15 63_CR18 J Sun (63_CR23) 2015; 27 M Dong (63_CR4) 2015; 29 D Li (63_CR17) 2018; 5 D Li (63_CR16) 2012; 24 H Li (63_CR20) 2019; 19 63_CR30 L Zhao (63_CR31) 2007; 56 |
| References_xml | – reference: LiHOtaKDongMEccn: Orchestration of edge-centric computing and content-centric networking in the 5g radio access networkIEEE Wirel. Commun.2018253889310.1109/MWC.2018.1700315 – reference: Harris, T.L., Fraser, K., Pratt, I.A.: A practical multi-word compare-and-swap operation. In: International Symposium on Distributed Computing, Springer, pp 265–279 (2002) – reference: SunJChenHHeLTanHRedundant network traffic elimination with gpu accelerated rabin fingerprintingIEEE Trans. Parallel Distrib. Syst.20152772130214210.1109/TPDS.2015.2473166 – reference: DongMLiHOtaKXiaoJRule caching in sdn-enabled mobile access networksIEEE Netw.2015294404510.1109/MNET.2015.7166189 – reference: Seshadri, V., Kim, Y., Fallin, C., Lee, D., Ausavarungnirun, R., Pekhimenko, G., Luo, Y., Mutlu, O., Gibbons, P.B., Kozuch, M.A, et al. Rowclone: fast and energy-efficient in-dram bulk data copy and initialization. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp 185–197 (2013) – reference: Chen, Q., Zheng, L., Liao, X., Jin, H., Wang, Q.: Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture. CCF Transactions on High Performance Computing pp 1–14 (2020a) – reference: Huang, D., Lu, Y.: Improving the efficiency of hpc data movement on container-based virtual cluster. CCF Trans. High Perform. Comput. pp 1–14 (2020) – reference: ZhouZChenXLiEZengLLuoKZhangJEdge intelligence: Paving the last mile of artificial intelligence with edge computingProc. IEEE201910781738176210.1109/JPROC.2019.2918951 – reference: Lepak, K., Talbot, G., White, S., Beck, N., Naffziger, S., et al. (2017) The next generation amd enterprise server product architecture. IEEE Hot Chips 29 – reference: Wong, S., Duarte, F., Vassiliadis, S.: A hardware cache memcpy accelerator. In: 2006 IEEE International Conference on Field Programmable Technology, IEEE, pp 141–148 (2006) – reference: Intel (2014) Intel®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} Xeon®\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textregistered $$\end{document} E7-2800, E7-4800, E7-8800 v2 Datasheet, Vol. 2, March 2014 – reference: Vaidyanathan, K., Huang, W., Chai, L., Panda, D.K.: Designing efficient asynchronous memory operations using hardware copy engine: A case study with i/oat. In: 2007 IEEE International Parallel and Distributed Processing Symposium, IEEE, pp 1–8 (2007b) – reference: Vaidyanathan, K., Chai, L., Huang, W., Panda, D.K.: Efficient asynchronous memory copy operations on multi-core systems and i/oat. In: 2007 IEEE International Conference on Cluster Computing, IEEE, pp 159–168 (2007a) – reference: Kanter, D.: Intel’s sandy bridge microarchitecture (2010) – reference: ZhongWSunJChenHXiaoJChenZChengCShiXOptimizing graph processing on gpusIEEE Trans. Parallel Distrib. Syst.20162841149116210.1109/TPDS.2016.2611659 – reference: LiDLiaoXJinHZhouBZhangQA new disk i/o model of virtualized cloud environmentIEEE Trans. Parallel Distrib. Syst.20122461129113810.1109/TPDS.2012.321 – reference: LiDDongMYuanYChenJOtaKTangYSeer-mcache: A prefetchable memory object caching system for iot real-time data processingIEEE Internet Things J.2018553648366010.1109/JIOT.2018.2868334 – reference: Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: Proceedings of the 3rd conference on Computing frontiers, pp 1–8 (2006) – reference: Atlidakis, V., Andrus, J., Geambasu, R., Mitropoulos, D., Nieh, J.: Posix abstractions in modern operating systems: The old, the new, and the missing. In: Proceedings of the Eleventh European Conference on Computer Systems, pp 1–17 (2016) – reference: Su, W., Wang, L., Su, M., Liu, S.: A processor-dma-based memory copy hardware accelerator. In: 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage, IEEE, pp 225–229 (2011) – reference: Chen, W., Chen, Z., Li, D., Liu, H., Tang, Y.: Low-overhead inline deduplication for persistent memory. Transactions on Emerging Telecommunications Technologies p e4079 (2020b) – reference: LiHOtaKDongMDeep reinforcement scheduling for mobile crowdsensing in fog computingACM Trans. Internet Technol. (TOIT)201919211810.1145/3234463 – reference: Yang, Z., Harris, J.R., Walker, B., Verkamp, D., Liu, C., Chang, C., Cao, G., Stern, J., Verma, V., Paul, L.E.: Spdk: A development kit to build high performance storage applications. In: 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), IEEE, pp 154–161 (2017) – reference: Valois, J.D.: Lock-free linked lists using compare-and-swap. In: Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, pp 214–222 (1995) – reference: Zhao, L., Iyer, R., Makineni, S., Bhuyan, L., Newell, D.: Hardware support for bulk data movement in server platforms. In: 2005 International Conference on Computer Design, IEEE, pp 53–60 (2005) – reference: Fang, J., Huang, C., Tang, T., Wang, Z.: Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Trans. High Perform. Comput. pp 1–19 (2020) – reference: Vassiliadis ,S., Duarte, F., Wong, S.: A load/store unit for a memcpy hardware accelerator. In: 2007 International Conference on Field Programmable Logic and Applications, IEEE, pp 537–541 (2007) – reference: Li, D., Ota, K., Zhong, Y., Dong, M., Tang, Y., Qiu, J.: Towards high-efficient transaction commitment in a virtualized and sustainable rdbms. IEEE Trans. Sustain. Comput. (2019a). https://doi.org/10.1109/TSUSC.2019.2890841 – reference: DuarteFWongSCache-based memory copy hardware accelerator for multicore systemsIEEE Trans. Comput.2010591114941507276729810.1109/TC.2010.41 – reference: HuaYShiXJinHLiuWJiangYChenYHeLSoftware-defined qos for i/o in exascale computingCCF Trans. High Perform. Comput.201911495910.1007/s42514-019-00005-9 – reference: ZhaoLBhuyanLNIyerRMakineniSNewellDHardware support for accelerating data movement in server platformIEEE Trans. Comput.2007566740753241169110.1109/TC.2007.1036 – reference: Jiang, X., Solihin, Y., Zhao, L., Iyer, R.: Architecture support for improving bulk memory copying and initialization performance. In: 2009 18th International Conference on Parallel Architectures and Compilation Techniques, IEEE, pp 169–180 (2009) – reference: Zhou, Z., Yang, S., Pu, L.J., Yu, S.: Cefl: Online admission control, data scheduling and accuracy tuning for cost-efficient federated learning across edge nodes. IEEE Internet Things J. (2020) – reference: Govindaraju, R.K., Cheng, L., Ranganathan, P., Marty, M.R., Gallatin, A.: Asynchronous copying of data within memory. US Patent 10,191,672 (2019) – ident: 63_CR21 doi: 10.1145/2540708.2540725 – ident: 63_CR34 doi: 10.1109/JIOT.2020.2984332 – ident: 63_CR9 doi: 10.1007/3-540-36108-1_18 – ident: 63_CR24 doi: 10.1109/CLUSTR.2007.4629228 – volume: 56 start-page: 740 issue: 6 year: 2007 ident: 63_CR31 publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2007.1036 – volume: 107 start-page: 1738 issue: 8 year: 2019 ident: 63_CR33 publication-title: Proc. IEEE doi: 10.1109/JPROC.2019.2918951 – volume: 5 start-page: 3648 issue: 5 year: 2018 ident: 63_CR17 publication-title: IEEE Internet Things J. doi: 10.1109/JIOT.2018.2868334 – ident: 63_CR27 doi: 10.1109/FPL.2007.4380711 – ident: 63_CR30 – ident: 63_CR15 – ident: 63_CR29 doi: 10.1109/CloudCom.2017.14 – volume: 1 start-page: 49 issue: 1 year: 2019 ident: 63_CR10 publication-title: CCF Trans. High Perform. Comput. doi: 10.1007/s42514-019-00005-9 – volume: 28 start-page: 1149 issue: 4 year: 2016 ident: 63_CR32 publication-title: IEEE Trans. Parallel Distrib. Syst. doi: 10.1109/TPDS.2016.2611659 – ident: 63_CR18 doi: 10.1109/TSUSC.2019.2890841 – volume: 19 start-page: 1 issue: 2 year: 2019 ident: 63_CR20 publication-title: ACM Trans. Internet Technol. (TOIT) doi: 10.1145/3234463 – ident: 63_CR8 doi: 10.1145/1128022.1128023 – ident: 63_CR22 doi: 10.1109/NAS.2011.15 – ident: 63_CR2 doi: 10.1007/s42514-020-00041-w – ident: 63_CR28 doi: 10.1109/FPT.2006.270305 – ident: 63_CR7 – ident: 63_CR13 doi: 10.1109/PACT.2009.31 – ident: 63_CR11 doi: 10.1007/s42514-020-00025-w – ident: 63_CR1 doi: 10.1145/2901318.2901350 – ident: 63_CR3 doi: 10.1002/ett.4079 – ident: 63_CR26 doi: 10.1145/224964.224988 – ident: 63_CR12 – ident: 63_CR14 – ident: 63_CR25 doi: 10.1109/IPDPS.2007.370479 – ident: 63_CR6 doi: 10.1007/s42514-020-00039-4 – volume: 59 start-page: 1494 issue: 11 year: 2010 ident: 63_CR5 publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2010.41 – volume: 24 start-page: 1129 issue: 6 year: 2012 ident: 63_CR16 publication-title: IEEE Trans. Parallel Distrib. Syst. doi: 10.1109/TPDS.2012.321 – volume: 27 start-page: 2130 issue: 7 year: 2015 ident: 63_CR23 publication-title: IEEE Trans. Parallel Distrib. Syst. doi: 10.1109/TPDS.2015.2473166 – volume: 29 start-page: 40 issue: 4 year: 2015 ident: 63_CR4 publication-title: IEEE Netw. doi: 10.1109/MNET.2015.7166189 – volume: 25 start-page: 88 issue: 3 year: 2018 ident: 63_CR19 publication-title: IEEE Wirel. Commun. doi: 10.1109/MWC.2018.1700315 |
| SSID | ssj0002710226 ssib053822361 |
| Score | 2.2076266 |
| Snippet | Memory copying is one of the most common operations in modern software. Usually, the operation reflects a synchronous (sync) CPU procedure of memory copying,... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 129 |
| SubjectTerms | 3. High Performance Distributed Computing Central processing units Channels Computer Hardware Computer memory Computer Science Computer Systems Organization and Communication Networks Copying CPUs Regular Paper Software Stalling |
| SummonAdditionalLinks | – databaseName: Springer Nature - Connect here FIRST to enable access dbid: RSV link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA86ffDF-YnTKXnwTYNd2iatb2M4HOiUOcfeStokMNjasW6D_vcmWbOhqKCPoZdQ7iN37d39DoBrJijHlBBEOcXIY5KgOEgEwoQ5jAhGVhndwRPtdoPhMHwtm8JyW-1uU5Lmpl43uyntanhIlxQYx4qKbbCj3F2gBzb03gZWi5QFY4soYu5jrJ2ombuGfewhL8S47J75_tjPHmoTdn7JlBoH1K7-79UPwH4ZcMLmSkMOwZZIj0DVDnOApW0fg1av-dzq3EMG02wpxpDlRZpo5NxskcOJrsctYJJNdVeUWup-4VE-gdoJcpilsHP30uyfgPf2Q7_1iMoJCyhRpjdHVAoax8SlPuUeF6HDmeShkC6OlfVw7DIpNUBMHPiOlJj6XFKmIfg5lziQsXsKKmmWijMAmUtU5BGrj9tG7IWKLKFqETK1lQecODXQsFyNkhJ-XE_BGEdr4GTDpUhxKTJciooauFnvma7AN36lrlthRaUh5pGKZlzdJRq6NXBrhbN5_PNp538jvwB72MhX_5-pg8p8thCXYDdZzkf57Moo6AcLLtyJ priority: 102 providerName: Springer Nature |
| Title | RAMCI: a novel asynchronous memory copying mechanism based on I/OAT |
| URI | https://link.springer.com/article/10.1007/s42514-021-00063-y https://www.proquest.com/docview/2933628593 |
| Volume | 3 |
| WOSCitedRecordID | wos000670453400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 2524-4930 dateEnd: 20241212 omitProxy: false ssIdentifier: ssj0002710226 issn: 2524-4922 databaseCode: K7- dateStart: 20190501 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2524-4930 dateEnd: 20241212 omitProxy: false ssIdentifier: ssj0002710226 issn: 2524-4922 databaseCode: BENPR dateStart: 20190501 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 2524-4930 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002710226 issn: 2524-4922 databaseCode: RSV dateStart: 20190501 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NT8IwFG8UPHjxI2pEkfTgTRtGt7WbF4MEIokiQSTelm5tExLYkAHJ_nvbskE0kYuXJc3aJuv7XN97vwfALROUY0oIopxi5DBJUOhFAmHCLEYEI-uI7uiF9nre56ffzy_c0jytstCJRlHzJNJ35HVllmxd7ufbj7MvpLtG6ehq3kJjH5QbWClhHZSlqOAnJcu4wBYxmhlrc2o6sGEXO8jxMc7raEw1nWLfhoN0zoKx3Cj7aau2DuivmKkxRZ3j_37ECTjKnVDYXHPNKdgT8RloDZqvre4DZDBOVmICWZrFkYbOTZYpnOqE3AxGyUyXRamhLhgep1OorSCHSQy79bfm8Bx8dNrD1jPKWyygSMneAlEpaBgSm7qUO1z4FmeS-0LaOFTiw7HNpNQIMaHnWlJi6nJJmcbg51xiT4b2BSjFSSwuAWQ2Ua5HqP5uG6Hjq2kRVQOfqaXc48SqgEZxmEGU44_rNhiTYIOcbAgQKAIEhgBBVgF3mzWzNfrGztnV4tSDXBLTYHvkFXBf0G37-u_drnbvdg0OsWEVfSFTBaXFfCluwEG0WozTeQ2Un9q9_qBm-FE9B--jb6RM4hg |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT-MwEB6xgLRcFhAgytMHOLEWrZPaCdIKVQVE1VIQ6iJuwYltCQmSQgqr_Cl-43rchAokuHHgaPmhKN83Dz9mBmBHaqGY4JwKJRj1peE0DhJNGZd1ybXk4xvdq57o94Pr6_BiCl6qWBh8VlnpRKeoVZbgGfm-NUsehvuF3uHwgWLVKLxdrUpojGnR1cU_u2XL_3SOLL67jJ0cD9qntKwqQBNLtxEVRos45p5oCuUrHdaVNCrUxmOxZYxinjQGk6LEQbNuDBNNZYTEtPNKGRaY2LPr_oAZ3wsEylVX0Iq_VnewKpeJswQMzber-MaazKd-yFgZt-Oi96y4NHyKbyScp0CLt7Zx4vC-u6N1pu9k_rv9tAX4VTrZpDWWikWY0ukStC9bZ-3OAZEkzZ71HZF5kSaYGjh7ysk9PjguSJINMezLNjEg-ja_J2jlFclS0tk_bw2W4e-XfPcKTKdZqleBSI9b1yq2u_dG7Id2WCJsI5R2qgoUr9egUYEXJWV-dSzzcRe9ZoZ2gEcW8MgBHhU12HudMxxnF_l09EaFclRqmjyaQFyD3xVPJt0fr7b2-Wrb8PN0cNaLep1-dx3mmKMpHj5twPTo8UlvwmzyPLrNH7ecDBC4-Wr-_AfI2T79 |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwED_8QnzxW5xOzYNvGralXdL6NqbD4ZxD5_CtpE0Cg9kOuw3235tk7aaigvgYmgR6uetdevf7HcA5l0wQRilmghHsckVx6EUSE8rLnEpOZxndXou1297Li9_5gOK31e55SnKGaTAsTfGoNBSqNAe-aU2ruNiUF1gni6fLsOqaQnpzX3_q5RqlrZnk7CL220yMQ7U92EiVuNj1CcmQNN9v-9lbLULQL1lT64waW_9_jW3YzAJRVJtpzg4syXgXtvImDyiz-T2oP9bu680rxFGcTOQA8XQaR4ZRNxmn6NXU6U5RlAwNWkoPDY64n74i4xwFSmLULD3Uuvvw3Ljp1m9x1nkBR9okR5gpycKQOqzKhCukXxZcCV8qh4TaqgRxuFKGOCb0qmWlCKsKxbih5hdCEU-FzgGsxEksDwFxh-qIJNSX3kro-npaxPTA53qp8AQtF6CSSziIMlpy0x1jEMwJla2UAi2lwEopmBbgYr5mOCPl-HV2MT-4IDPQNNBRjmPQo75TgMv8oBaPf97t6G_Tz2C9c90IWs323TFsEHvU5hdOEVZGb2N5AmvRZNRP306t3r4D8GzoUQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=RAMCI%3A+a+novel+asynchronous+memory+copying+mechanism+based+on+I%2FOAT&rft.jtitle=CCF+transactions+on+high+performance+computing+%28Online%29&rft.au=Chen%2C+Zhenke&rft.au=Li%2C+Dingding&rft.au=Wang%2C+Zhiwen&rft.au=Liu%2C+Hai&rft.date=2021-06-01&rft.pub=Springer+Nature+B.V&rft.issn=2524-4922&rft.eissn=2524-4930&rft.volume=3&rft.issue=2&rft.spage=129&rft.epage=143&rft_id=info:doi/10.1007%2Fs42514-021-00063-y |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2524-4922&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2524-4922&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2524-4922&client=summon |