A distributed in-memory key-value store system on heterogeneous CPU–GPU cluster
In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data-intensive operations demanding high memory bandwidth for fast data accesses, (...
Gespeichert in:
| Veröffentlicht in: | The VLDB journal Jg. 26; H. 5; S. 729 - 750 |
|---|---|
| Hauptverfasser: | , , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.10.2017
Springer Nature B.V |
| Schlagworte: | |
| ISSN: | 1066-8888, 0949-877X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data-intensive operations demanding high memory bandwidth for fast data accesses, (2) high data parallelism and simple computing operations demanding many slim parallel computing units, and (3) a large working set. However, our experiments show that homogeneous multicore CPU systems are increasingly mismatched to the special properties of key-value stores because they do not provide massive data parallelism and high memory bandwidth; the powerful but the limited number of computing cores does not satisfy the demand of the unique data processing task; and the cache hierarchy may not well benefit to the large working set. In this paper, we present the design and implementation of Mega-KV, a distributed in-memory key-value store system on a heterogeneous CPU–GPU cluster. Effectively utilizing the high memory bandwidth and latency hiding capability of GPUs, Mega-KV provides fast data accesses and significantly boosts overall performance and energy efficiency over the homogeneous CPU architectures. Mega-KV shows excellent scalability and processes up to 623-million key-value operations per second on a cluster installed with eight CPUs and eight GPUs, while delivering an efficiency of up to 299-thousand operations per Watt (KOPS/W). |
|---|---|
| AbstractList | In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data-intensive operations demanding high memory bandwidth for fast data accesses, (2) high data parallelism and simple computing operations demanding many slim parallel computing units, and (3) a large working set. However, our experiments show that homogeneous multicore CPU systems are increasingly mismatched to the special properties of key-value stores because they do not provide massive data parallelism and high memory bandwidth; the powerful but the limited number of computing cores does not satisfy the demand of the unique data processing task; and the cache hierarchy may not well benefit to the large working set. In this paper, we present the design and implementation of Mega-KV, a distributed in-memory key-value store system on a heterogeneous CPU–GPU cluster. Effectively utilizing the high memory bandwidth and latency hiding capability of GPUs, Mega-KV provides fast data accesses and significantly boosts overall performance and energy efficiency over the homogeneous CPU architectures. Mega-KV shows excellent scalability and processes up to 623-million key-value operations per second on a cluster installed with eight CPUs and eight GPUs, while delivering an efficiency of up to 299-thousand operations per Watt (KOPS/W). |
| Author | Wang, Kaibo Hua, Bei Zhang, Xiaodong Hu, Jiayu Yuan, Yuan Guo, Lei He, Bingsheng Zhang, Kai Li, Rubao |
| Author_xml | – sequence: 1 givenname: Kai orcidid: 0000-0001-7518-5466 surname: Zhang fullname: Zhang, Kai email: kay21s@gmail.com organization: Fudan University – sequence: 2 givenname: Kaibo surname: Wang fullname: Wang, Kaibo organization: Google Inc – sequence: 3 givenname: Yuan surname: Yuan fullname: Yuan, Yuan organization: The Ohio State University – sequence: 4 givenname: Lei surname: Guo fullname: Guo, Lei organization: Google Inc – sequence: 5 givenname: Rubao surname: Li fullname: Li, Rubao organization: The Ohio State University – sequence: 6 givenname: Xiaodong surname: Zhang fullname: Zhang, Xiaodong organization: The Ohio State University – sequence: 7 givenname: Bingsheng surname: He fullname: He, Bingsheng organization: National University of Singapore – sequence: 8 givenname: Jiayu surname: Hu fullname: Hu, Jiayu organization: University of Science and Technology of China – sequence: 9 givenname: Bei surname: Hua fullname: Hua, Bei organization: University of Science and Technology of China |
| BookMark | eNp9kM9KAzEQxoNUsK0-gLeA52iy_5I9lqJVKFjBgreQ3Z2tW7dJTbLC3nwH39AnMWU9iKBz-AaG7zczfBM00kYDQueMXjJK-ZULwgWhjBOa8JzQIzSmeZITwfnTCI0ZzTIiQp2giXNbSmkURekYPcxw1Thvm6LzUOFGkx3sjO3xC_TkTbUdYOeNDdo7DztsNH4GD9ZsQIPpHJ6v1p_vH4vVGpdtFyz2FB3XqnVw9t2naH1z_Ti_Jcv7xd18tiRlzDJPskiVjCkukrJSoFQt6kQlaRqmnKaCQlHFUGdpldZCCQ5RURY5zzIQUZrkeRVP0cWwd2_NawfOy63prA4nJctjkWc0Tnhw8cFVWuOchVqWjVe-Mdpb1bSSUXnITw75yZCfPOQnaSDZL3Jvm52y_b9MNDAuePUG7I-f_oS-ALeUhiQ |
| CitedBy_id | crossref_primary_10_1145_3538225 crossref_primary_10_1016_j_eswa_2024_123570 |
| Cites_doi | 10.1145/2508148.2485964 10.14778/2536360.2536370 10.1145/1629575.1629577 10.14778/2732967.2732976 10.1145/1807128.1807152 10.1145/2377677.2377681 10.1145/1816038.1815998 10.1145/2508148.2485926 10.1109/IGCC.2011.6008565 10.1145/191839.191886 10.1016/j.jalgor.2003.12.002 10.1109/ISPASS.2012.6189209 10.14778/2536206.2536210 10.1109/L-CA.2013.17 10.14778/1952376.1952381 10.1109/ICDE.2017.120 10.1109/PACT.2011.17 10.1145/1851275.1851207 10.1145/2882903.2915224 10.1016/B978-012722442-8/50043-4 10.1145/1376616.1376670 10.1145/1345206.1345220 10.1109/ICDE.2014.6816677 10.1145/2435264.2435306 10.1145/2391229.2391238 10.1109/MICRO.2012.19 10.1145/2145816.2145874 10.14778/2809974.2809984 10.1145/2236584.2236592 10.1145/2168836.2168855 10.14778/2850583.2850585 10.1145/2741948.2741956 10.1145/2254756.2254766 10.1145/2541940.2541951 10.1007/s00450-015-0300-5 10.1145/2806777.2806836 10.1145/2591971.2592002 10.1145/258533.258660 10.1109/MM.2014.41 10.1109/ICPP.2012.31 10.1145/1713254.1713276 10.1145/2517349.2522713 10.1145/2749469.2750416 |
| ContentType | Journal Article |
| Copyright | Springer-Verlag GmbH Germany 2017 Copyright Springer Science & Business Media 2017 |
| Copyright_xml | – notice: Springer-Verlag GmbH Germany 2017 – notice: Copyright Springer Science & Business Media 2017 |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s00778-017-0479-0 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 0949-877X |
| EndPage | 750 |
| ExternalDocumentID | 10_1007_s00778_017_0479_0 |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 123 1N0 1SB 2.D 203 29R 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 3-Y 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AAKMM AALFJ AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAWTV AAYFX AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACM ACMDZ ACMLO ACOKC ACOMO ACPIV ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADL ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEBYY AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AENSD AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWIH AFWTZ AFWXC AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BBWZM BDATZ BGNMA BSONS CAG CCLIF COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EDO EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GUFHI GXS H13 HF~ HG5 HG6 HGAVV HMJXF HQYDN HRMNR HVGLF HZ~ I07 I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAS LHSKQ LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P0- P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VXZ W23 W48 W7O WK8 YLTOR YZZ Z45 Z7R Z7X Z83 Z88 Z8M Z8R Z8W Z92 ZMTXR ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG AEFXT AEJOY AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP AKRVB ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- PHGZM PHGZT PQGLB JQ2 |
| ID | FETCH-LOGICAL-c316t-62ac11a784cdaeaaf8f4a455ac170580ebd3ef65d5f8a87e2bcb9766e825499d3 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000410771700006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1066-8888 |
| IngestDate | Thu Sep 25 00:52:05 EDT 2025 Sat Nov 29 03:17:17 EST 2025 Tue Nov 18 20:53:23 EST 2025 Fri Feb 21 02:37:43 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 5 |
| Keywords | Heterogeneous systems Energy efficiency Distributed systems GPU Key-value store |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c316t-62ac11a784cdaeaaf8f4a455ac170580ebd3ef65d5f8a87e2bcb9766e825499d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-7518-5466 |
| PQID | 1938960347 |
| PQPubID | 2043708 |
| PageCount | 22 |
| ParticipantIDs | proquest_journals_1938960347 crossref_citationtrail_10_1007_s00778_017_0479_0 crossref_primary_10_1007_s00778_017_0479_0 springer_journals_10_1007_s00778_017_0479_0 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-10-01 |
| PublicationDateYYYYMMDD | 2017-10-01 |
| PublicationDate_xml | – month: 10 year: 2017 text: 2017-10-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | Berlin/Heidelberg |
| PublicationPlace_xml | – name: Berlin/Heidelberg – name: New York |
| PublicationSubtitle | The International Journal on Very Large Data Bases |
| PublicationTitle | The VLDB journal |
| PublicationTitleAbbrev | The VLDB Journal |
| PublicationYear | 2017 |
| Publisher | Springer Berlin Heidelberg Springer Nature B.V |
| Publisher_xml | – name: Springer Berlin Heidelberg – name: Springer Nature B.V |
| References | Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: NSDI, pp. 385–398 (2013) Zhang, K., Hu, J., He, B., Hua, B.: Dido: Dynamic pipelines for in-memory key-value stores on coupled CPU-GPU architectures. In: ICDE, pp. 671–682 (2017) Mitchell, C., Geng, Y., Li, J.: Using one-sided rdma reads to build a fast, CPU-efficient key-value store. In: USENIX ATC, pp. 103–114 (2013) Jeong, E.Y., Woo, S., Jamshed, M., Jeong, H., Ihm, S., Han, D., Park, K.: mTCP: A highly scalable user-level tcp stack for multicore systems. In: NSDI (2014) Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. In: PVLDB, pp. 709–720 (2013) Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: STOC, pp. 654–663 (1997) Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: EuroSys, pp. 183–196 (2012) Paul, J., He, J., He, B.: GPL: A GPU-based pipelined query processing engine. In: SIGMOD, pp. 1935–1950 (2016) Wang, K., Ding, X., Lee, R., Kato, S., Zhang, X.: Gdm: Device memory management for GPGPU computing. In: SIGMETRICS, pp. 533–545 (2014) Li, C., Cox, A.L.: Gd-wheel: A cost-aware replacement policy for key-value stores. In: Proceedings of the Tenth European Conference on Computer Systems, EuroSys (2015) Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012) Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: IGCC, pp. 1–8 (2011) Geambasu, R., Levy, A.A., Kohno, T., Krishnamurthy, A., Levy, H.M.: Comet: an active distributed key-value store. In: OSDI (2010) Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: SOSP (2013) Pirk, H., Manegold, S., Kersten, M.: Waste not... efficient co-processing of relational data. In: ICDE, pp. 508–519 (2014) Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC, pp. 143–154 (2010) Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A., Ailamaki, A., Falsafi, B.: A case for specialized processors for scale-out workloads. In: Micro, pp. 31–42 (2014) Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. In: VLDB, pp. 96–107 (2015) Han, S., Jang, K., Park, K., Moon, S.: Packetshader: A GPU-accelerated software router. In: SIGCOMM, pp. 195–206 (2010) HongSKimHAn integrated GPU power and performance modelACM SIGARCH Comput. Archit. News201038328028910.1145/1816038.1815998 PriceDCClarkMABarsdellBRBabichRGreenhillLJOptimizing performance-per-watt on GPUs in high performance computingComput. Sci. Res. Dev.201631418519310.1007/s00450-015-0300-5 Ma, K., Li, X., Chen W., Zhang, C., Wang, X.: Green GPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: 2012 41st International Conference on Parallel Processing. Pittsburgh, PA, pp. 48–57 (2012) Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: DaMoN, pp. 55–62 (2012) Leng, T., Ali, R., Hsieh, J., Mashayekhi, V., Rooholamini, R.: An empirical study of hyper-threading in high performance computing clusters. Linux HPC Revolution (2002) Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: Fawn: a fast array of wimpy nodes. In: SOSP, pp. 1–14 (2009) PaghRRodlerFFCuckoo hashingJ. Algorithms2003512122144205014010.1016/j.jalgor.2003.12.0021091.68036 Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: SIGMOD, pp. 243–252 (1994) Lee, J., Sathisha, V., Schulte, M., Compton, K., Kim, N. S.: Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling. In: 2011 International Conference on Parallel Architectures and Compilation Techniques. Galveston, TX, pp. 111–120 (2011) Nvidia management library. https://developer.nvidia.com/nvidia-management-library-nvml LimKMeisnerDSaidiAGRanganathanPWenischTFThin servers with smart pipes: designing soc accelerators for memcachedSIGARCH Comput. Archit. News201341364710.1145/2508148.2485926 CPU Frequency Scaling. https://wiki.archlinux.org/index.php/CPU_frequency_scaling ZhangKWangKYuanYGuoLLeeRZhangXMega-kv: a case for GPUs to maximize the throughput of in-memory key-value storesProc. VLDB Endow.201581226123710.14778/2809974.2809984 Fan, B., Andersen, D.G., Kaminsky, M.: Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. In: NSDI, pp. 371–384 (2013) Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB, pp. 405–416 (2003) Erlingsson, Ú., Manasse, M., McSherry, F.: A cool and practical alternative to traditional hash tables. In: WDAS, pp. 1–6 (2006) Hetherington, T.H., O’Connor, M., Aamodt, T.M.: Memcachedgpu: Scaling-up scale-out key-value stores. In: SoCC, pp. 43–57 (2015) Li, S., Lim, H., Lee, V.W., Ahn, J.H., Kalia, A., Kaminsky, M., Andersen, D.G., Seongil, O., Lee, S., Dubey, P.: Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In: ISCA, pp. 476–488 (2015) ZhangKChenFDingXHuaiYLeeRLuoTWangKYuanYZhangXHetero-db: next generation high-performance database systems by best utilizing heterogeneous computing and storage resourcesJCST201530657678 Hetherington, T., Rogers, T., Hsu, L., O’Connor, M., Aamodt, T.: Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In: ISPASS, pp. 88–98 (2012) He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: SIGMOD, pp. 511–524 (2008) Gutierrez, A., Cieslak, M., Giridhar, B., Dreslinski, R.G., Ceze, L., Mudge, T.: Integrated 3d-stacked server designs for increasing physical density of key-value stores. In: ASPLOS, pp. 485–498 (2014) He, B., Yu, J.X.: High-throughput transaction executions on graphics processors. In: PVLDB (2011) EscrivaRWongBSirerEGHyperdex: a distributed, searchable key-value storeACM SIGCOMM Comput. Commun. Rev.2012424253610.1145/2377677.2377681 Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with GPUs. In: PVLDB, pp. 1011–1022 (2014) Intel dpdk. http://dpdk.org Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using cuda. In: PPoPP, pp. 73–82 (2008) Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: A cache-partitioned hash table. In: PPoPP, pp. 319–320 (2012) ZhangKHuJHuaBA holistic approach to build real-time stream processing system with GPUJPDC201583C4457 Chalamalasetti, S.R., Lim, K., Wright, M., AuYoung, A., Ranganathan, P., Margala, M.: An FPGA memcached appliance. In: FPGA, pp. 245–254 (2013) CormenTHLeisersonCERivestRLSteinCIntroduction to Algorithm20093CambridgeThe MIT Press1187.68679 LengJHetheringtonTEltantawyAGilaniSKimNSAamodtTMReddiVJGPUWattch: enabling energy optimizations in GPGPUsACM SIGARCH Comput. Archit. News201341348749810.1145/2508148.2485964 Lim, H., Han, D., Andersen, D.G., Kaminsky, M.: Mica: A holistic approach to fast in-memory key-value storage. In: NSDI, pp. 429–444 (2014) Memcached. http://memcached.org Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: SIGMETRICS, pp. 53–64 (2012) Kapoor, R., Porter, G., Tewari, M., Voelker, G.M., Vahdat, A.: Chronos: predictable low latency for data center applications. In: SoCC, pp. 9:1–9:14 (2012) OusterhoutJAgrawalPEricksonDKozyrakisCLeverichJMazièresDMitraSNarayananAParulkarGRosenblumMRumbleSMStratmannEStutsmanRThe case for ramclouds: scalable high-performance storage entirely in dramSIGOPS Oper. Syst. Rev.2010439210510.1145/1713254.1713276 Redis. http://redis.io LavasaniMAngepatHChiouDAn fpga-based in-line accelerator for memcachedComput. Archit. Lett.2013132576010.1109/L-CA.2013.17 Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. In: PVLDB, pp. 817–828 (2013) 479_CR50 K Zhang (479_CR55) 2015; 30 479_CR53 K Zhang (479_CR58) 2015; 8 479_CR10 479_CR54 479_CR51 479_CR52 479_CR14 479_CR12 479_CR56 479_CR17 479_CR18 479_CR15 479_CR59 479_CR16 479_CR6 479_CR7 479_CR4 479_CR5 479_CR2 479_CR3 479_CR1 TH Cormen (479_CR11) 2009 479_CR20 479_CR21 479_CR24 479_CR22 479_CR23 479_CR28 479_CR29 479_CR26 K Zhang (479_CR57) 2015; 83 479_CR27 479_CR19 479_CR31 479_CR35 479_CR36 479_CR33 479_CR34 479_CR39 K Lim (479_CR37) 2013; 41 479_CR38 J Leng (479_CR32) 2013; 41 S Hong (479_CR25) 2010; 38 R Pagh (479_CR44) 2003; 51 479_CR42 479_CR40 479_CR41 479_CR46 479_CR45 R Escriva (479_CR13) 2012; 42 479_CR8 J Ousterhout (479_CR43) 2010; 43 479_CR48 479_CR9 479_CR49 M Lavasani (479_CR30) 2013; 13 DC Price (479_CR47) 2016; 31 |
| References_xml | – reference: Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with GPUs. In: PVLDB, pp. 1011–1022 (2014) – reference: Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. In: PVLDB, pp. 817–828 (2013) – reference: PaghRRodlerFFCuckoo hashingJ. Algorithms2003512122144205014010.1016/j.jalgor.2003.12.0021091.68036 – reference: Memcached. http://memcached.org/ – reference: Han, S., Jang, K., Park, K., Moon, S.: Packetshader: A GPU-accelerated software router. In: SIGCOMM, pp. 195–206 (2010) – reference: Lee, J., Sathisha, V., Schulte, M., Compton, K., Kim, N. S.: Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling. In: 2011 International Conference on Parallel Architectures and Compilation Techniques. Galveston, TX, pp. 111–120 (2011) – reference: Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: DaMoN, pp. 55–62 (2012) – reference: Zhang, K., Hu, J., He, B., Hua, B.: Dido: Dynamic pipelines for in-memory key-value stores on coupled CPU-GPU architectures. In: ICDE, pp. 671–682 (2017) – reference: ZhangKHuJHuaBA holistic approach to build real-time stream processing system with GPUJPDC201583C4457 – reference: He, B., Yu, J.X.: High-throughput transaction executions on graphics processors. In: PVLDB (2011) – reference: Jeong, E.Y., Woo, S., Jamshed, M., Jeong, H., Ihm, S., Han, D., Park, K.: mTCP: A highly scalable user-level tcp stack for multicore systems. In: NSDI (2014) – reference: ZhangKChenFDingXHuaiYLeeRLuoTWangKYuanYZhangXHetero-db: next generation high-performance database systems by best utilizing heterogeneous computing and storage resourcesJCST201530657678 – reference: Li, C., Cox, A.L.: Gd-wheel: A cost-aware replacement policy for key-value stores. In: Proceedings of the Tenth European Conference on Computer Systems, EuroSys (2015) – reference: Li, S., Lim, H., Lee, V.W., Ahn, J.H., Kalia, A., Kaminsky, M., Andersen, D.G., Seongil, O., Lee, S., Dubey, P.: Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In: ISCA, pp. 476–488 (2015) – reference: CPU Frequency Scaling. https://wiki.archlinux.org/index.php/CPU_frequency_scaling/ – reference: Hetherington, T., Rogers, T., Hsu, L., O’Connor, M., Aamodt, T.: Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In: ISPASS, pp. 88–98 (2012) – reference: Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: IGCC, pp. 1–8 (2011) – reference: OusterhoutJAgrawalPEricksonDKozyrakisCLeverichJMazièresDMitraSNarayananAParulkarGRosenblumMRumbleSMStratmannEStutsmanRThe case for ramclouds: scalable high-performance storage entirely in dramSIGOPS Oper. Syst. Rev.2010439210510.1145/1713254.1713276 – reference: Erlingsson, Ú., Manasse, M., McSherry, F.: A cool and practical alternative to traditional hash tables. In: WDAS, pp. 1–6 (2006) – reference: LavasaniMAngepatHChiouDAn fpga-based in-line accelerator for memcachedComput. Archit. Lett.2013132576010.1109/L-CA.2013.17 – reference: Fan, B., Andersen, D.G., Kaminsky, M.: Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. In: NSDI, pp. 371–384 (2013) – reference: Hetherington, T.H., O’Connor, M., Aamodt, T.M.: Memcachedgpu: Scaling-up scale-out key-value stores. In: SoCC, pp. 43–57 (2015) – reference: Paul, J., He, J., He, B.: GPL: A GPU-based pipelined query processing engine. In: SIGMOD, pp. 1935–1950 (2016) – reference: Nvidia management library. https://developer.nvidia.com/nvidia-management-library-nvml/ – reference: Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: SIGMETRICS, pp. 53–64 (2012) – reference: ZhangKWangKYuanYGuoLLeeRZhangXMega-kv: a case for GPUs to maximize the throughput of in-memory key-value storesProc. VLDB Endow.201581226123710.14778/2809974.2809984 – reference: Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: STOC, pp. 654–663 (1997) – reference: LengJHetheringtonTEltantawyAGilaniSKimNSAamodtTMReddiVJGPUWattch: enabling energy optimizations in GPGPUsACM SIGARCH Comput. Archit. News201341348749810.1145/2508148.2485964 – reference: Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012) – reference: CormenTHLeisersonCERivestRLSteinCIntroduction to Algorithm20093CambridgeThe MIT Press1187.68679 – reference: Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: NSDI, pp. 385–398 (2013) – reference: LimKMeisnerDSaidiAGRanganathanPWenischTFThin servers with smart pipes: designing soc accelerators for memcachedSIGARCH Comput. Archit. News201341364710.1145/2508148.2485926 – reference: Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. In: VLDB, pp. 96–107 (2015) – reference: Intel dpdk. http://dpdk.org/ – reference: EscrivaRWongBSirerEGHyperdex: a distributed, searchable key-value storeACM SIGCOMM Comput. Commun. Rev.2012424253610.1145/2377677.2377681 – reference: Wang, K., Ding, X., Lee, R., Kato, S., Zhang, X.: Gdm: Device memory management for GPGPU computing. In: SIGMETRICS, pp. 533–545 (2014) – reference: Pirk, H., Manegold, S., Kersten, M.: Waste not... efficient co-processing of relational data. In: ICDE, pp. 508–519 (2014) – reference: Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: EuroSys, pp. 183–196 (2012) – reference: Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB, pp. 405–416 (2003) – reference: Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. In: PVLDB, pp. 709–720 (2013) – reference: Kapoor, R., Porter, G., Tewari, M., Voelker, G.M., Vahdat, A.: Chronos: predictable low latency for data center applications. In: SoCC, pp. 9:1–9:14 (2012) – reference: Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using cuda. In: PPoPP, pp. 73–82 (2008) – reference: Leng, T., Ali, R., Hsieh, J., Mashayekhi, V., Rooholamini, R.: An empirical study of hyper-threading in high performance computing clusters. Linux HPC Revolution (2002) – reference: Gutierrez, A., Cieslak, M., Giridhar, B., Dreslinski, R.G., Ceze, L., Mudge, T.: Integrated 3d-stacked server designs for increasing physical density of key-value stores. In: ASPLOS, pp. 485–498 (2014) – reference: Chalamalasetti, S.R., Lim, K., Wright, M., AuYoung, A., Ranganathan, P., Margala, M.: An FPGA memcached appliance. In: FPGA, pp. 245–254 (2013) – reference: Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: A cache-partitioned hash table. In: PPoPP, pp. 319–320 (2012) – reference: Ma, K., Li, X., Chen W., Zhang, C., Wang, X.: Green GPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: 2012 41st International Conference on Parallel Processing. Pittsburgh, PA, pp. 48–57 (2012) – reference: Redis. http://redis.io/ – reference: Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: Fawn: a fast array of wimpy nodes. In: SOSP, pp. 1–14 (2009) – reference: Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A., Ailamaki, A., Falsafi, B.: A case for specialized processors for scale-out workloads. In: Micro, pp. 31–42 (2014) – reference: Geambasu, R., Levy, A.A., Kohno, T., Krishnamurthy, A., Levy, H.M.: Comet: an active distributed key-value store. In: OSDI (2010) – reference: Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC, pp. 143–154 (2010) – reference: Lim, H., Han, D., Andersen, D.G., Kaminsky, M.: Mica: A holistic approach to fast in-memory key-value storage. In: NSDI, pp. 429–444 (2014) – reference: Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: SIGMOD, pp. 243–252 (1994) – reference: HongSKimHAn integrated GPU power and performance modelACM SIGARCH Comput. Archit. News201038328028910.1145/1816038.1815998 – reference: Mitchell, C., Geng, Y., Li, J.: Using one-sided rdma reads to build a fast, CPU-efficient key-value store. In: USENIX ATC, pp. 103–114 (2013) – reference: He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: SIGMOD, pp. 511–524 (2008) – reference: PriceDCClarkMABarsdellBRBabichRGreenhillLJOptimizing performance-per-watt on GPUs in high performance computingComput. Sci. Res. Dev.201631418519310.1007/s00450-015-0300-5 – reference: Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: SOSP (2013) – volume: 41 start-page: 487 issue: 3 year: 2013 ident: 479_CR32 publication-title: ACM SIGARCH Comput. Archit. News doi: 10.1145/2508148.2485964 – ident: 479_CR22 doi: 10.14778/2536360.2536370 – ident: 479_CR6 doi: 10.1145/1629575.1629577 – volume-title: Introduction to Algorithm year: 2009 ident: 479_CR11 – ident: 479_CR52 doi: 10.14778/2732967.2732976 – ident: 479_CR10 doi: 10.1145/1807128.1807152 – volume: 42 start-page: 25 issue: 4 year: 2012 ident: 479_CR13 publication-title: ACM SIGCOMM Comput. Commun. Rev. doi: 10.1145/2377677.2377681 – volume: 38 start-page: 280 issue: 3 year: 2010 ident: 479_CR25 publication-title: ACM SIGARCH Comput. Archit. News doi: 10.1145/1816038.1815998 – volume: 41 start-page: 36 year: 2013 ident: 479_CR37 publication-title: SIGARCH Comput. Archit. News doi: 10.1145/2508148.2485926 – ident: 479_CR8 doi: 10.1109/IGCC.2011.6008565 – ident: 479_CR17 doi: 10.1145/191839.191886 – volume: 51 start-page: 122 issue: 2 year: 2003 ident: 479_CR44 publication-title: J. Algorithms doi: 10.1016/j.jalgor.2003.12.002 – ident: 479_CR23 doi: 10.1109/ISPASS.2012.6189209 – ident: 479_CR54 doi: 10.14778/2536206.2536210 – volume: 13 start-page: 57 issue: 2 year: 2013 ident: 479_CR30 publication-title: Comput. Archit. Lett. doi: 10.1109/L-CA.2013.17 – ident: 479_CR3 – ident: 479_CR21 doi: 10.14778/1952376.1952381 – ident: 479_CR56 doi: 10.1109/ICDE.2017.120 – ident: 479_CR41 – ident: 479_CR31 doi: 10.1109/PACT.2011.17 – ident: 479_CR19 doi: 10.1145/1851275.1851207 – ident: 479_CR45 doi: 10.1145/2882903.2915224 – ident: 479_CR59 doi: 10.1016/B978-012722442-8/50043-4 – ident: 479_CR20 doi: 10.1145/1376616.1376670 – ident: 479_CR49 doi: 10.1145/1345206.1345220 – ident: 479_CR2 – ident: 479_CR14 – ident: 479_CR42 – ident: 479_CR46 doi: 10.1109/ICDE.2014.6816677 – ident: 479_CR9 doi: 10.1145/2435264.2435306 – ident: 479_CR28 doi: 10.1145/2391229.2391238 – ident: 479_CR53 doi: 10.1109/MICRO.2012.19 – volume: 30 start-page: 657 year: 2015 ident: 479_CR55 publication-title: JCST – ident: 479_CR40 doi: 10.1145/2145816.2145874 – ident: 479_CR5 – volume: 83 start-page: 44 issue: C year: 2015 ident: 479_CR57 publication-title: JPDC – volume: 8 start-page: 1226 year: 2015 ident: 479_CR58 publication-title: Proc. VLDB Endow. doi: 10.14778/2809974.2809984 – ident: 479_CR1 – ident: 479_CR36 – ident: 479_CR27 doi: 10.1145/2236584.2236592 – ident: 479_CR39 doi: 10.1145/2168836.2168855 – ident: 479_CR48 doi: 10.14778/2850583.2850585 – ident: 479_CR34 doi: 10.1145/2741948.2741956 – ident: 479_CR7 doi: 10.1145/2254756.2254766 – ident: 479_CR18 doi: 10.1145/2541940.2541951 – volume: 31 start-page: 185 issue: 4 year: 2016 ident: 479_CR47 publication-title: Comput. Sci. Res. Dev. doi: 10.1007/s00450-015-0300-5 – ident: 479_CR24 doi: 10.1145/2806777.2806836 – ident: 479_CR51 doi: 10.1145/2591971.2592002 – ident: 479_CR26 – ident: 479_CR29 doi: 10.1145/258533.258660 – ident: 479_CR15 doi: 10.1109/MM.2014.41 – ident: 479_CR38 doi: 10.1109/ICPP.2012.31 – volume: 43 start-page: 92 year: 2010 ident: 479_CR43 publication-title: SIGOPS Oper. Syst. Rev. doi: 10.1145/1713254.1713276 – ident: 479_CR33 – ident: 479_CR50 doi: 10.1145/2517349.2522713 – ident: 479_CR35 doi: 10.1145/2749469.2750416 – ident: 479_CR16 – ident: 479_CR4 – ident: 479_CR12 |
| SSID | ssj0002225 |
| Score | 2.200599 |
| Snippet | In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 729 |
| SubjectTerms | Bandwidths Central processing units Clusters Computation Computer memory Computer Science Computer upgrading CPUs Data processing Data storage Database Management Distributed memory Regular Paper Universal Serial Bus |
| Title | A distributed in-memory key-value store system on heterogeneous CPU–GPU cluster |
| URI | https://link.springer.com/article/10.1007/s00778-017-0479-0 https://www.proquest.com/docview/1938960347 |
| Volume | 26 |
| WOSCitedRecordID | wos000410771700006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 0949-877X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002225 issn: 1066-8888 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELUQcOBCWUWhIB84gSylWWz3WFUUDqgqS6veIm8RlUqCmhaJG__AH_IleNwkLAIkuOSQOFE09njmaZ7fIHTckpozQQMStiQnFn9pwpWm1uOZ8AXlkjmC7PCS9Xp8NGr1i3Pcecl2L0uSbqeuDruB8gwQrxgBWXRicfqKjXYM-jVc3wyr7RcAjCtxUkosvONlKfO7T3wORu8Z5peiqIs13dq__nIDrRepJW4v1sImWjLpFqqVbRtw4cXb6KqNNcjlQqcro_E4JfdAt33C1p8JiH8bDJRJe3UqzzhL8R2QZjK71kw2z3GnP3h9fjnvD7CazEFoYQcNume3nQtSdFYgKmjSGaG-UM2mYDxUWhghEp6EIowie5d5EfeM1IFJaKSjhAvOjC-VtHkLNQs8qYNdtJxmqdlDmEGrDovbAt8TofQUl5ESiQ2NxjXeVHXklSaOVSE7Dt0vJnElmOxMFluTxWCy2Kujk-qVh4Xmxm-DG-W8xYX75bHNSrmFZkHI6ui0nKcPj3_62P6fRh-gNR8m2lH7Gmh5Np2bQ7SqHmfjfHrkVuUbW_Xb7Q |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3LSgMxFA1SBd1Yn1itmoUrJTCdR5IuS7FWrKViW7obMkkGC3VG-hDc-Q_-oV9ibjpTH6igm1nMJGG4yU3u4Z6ci9BJNVKcCeoRvxpxYvCXIlwqajyeCVdQHjFLkO23WLvNB4NqJ7vHPcnZ7nlK0u7Ui8tuoDwDxCtGQBadGJy-7IPgF0D02_5i-wUAY1OclBID73ieyvxuiM-H0XuE-SUpas-aRvFff7mB1rPQEtfma2ETLelkCxXzsg048-JtdFPDCuRyodKVVniYkHug2z5h488ExL81BsqkeVqVZ5wm-A5IM6lZazqdTXC903t9frno9LAczUBoYQf1GufdepNklRWI9Cp0SqgrZKUiGPelElqImMe-8IPAvGVOwB0dKU_HNFBBzAVn2o1kZOIWqud4Unm7qJCkid5DmEGpDoPbPNcRfuRIHgVSxOZo1LbwpiwhJzdxKDPZcah-MQoXgsnWZKExWQgmC50SOl10eZhrbvzWuJzPW5i53yQ0USk30MzzWQmd5fP04fNPg-3_qfUxWm12r1th67J9dYDWXJh0S_Mro8J0PNOHaEU-ToeT8ZFdoW-0CN7R |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA5SRbxYn1itmoMnJXS7jyQ9lmpVLKU-WnpbskkWC3Vb-hC8-R_8h_4SM_vygQriZQ-72bDMZDYzzJfvQ-ioFijOBHWIWws4MfWXIlwqaiKeCVtQHrAYINtrsXab9_u1TqpzOs3Q7llLMjnTACxN0awyVmElP_gGLDQAwmIEKNKJqdkXXTM9YPpubnv5rxiKmbjdSSkxpR7P2prfTfF5Y3rPNr80SON9p1n89xevodU05cT1ZI2sowUdbaBiJueA0-jeRNd1rIBGFxSwtMKDiDwADPcJmzgnQAquMUApzTVmf8ajCN8DmGZk1qAezae40em-Pr-cd7pYDudAwLCFus2zu8YFSRUXiHSqdEaoLWS1Khh3pRJaiJCHrnA9z9xllsctHShHh9RTXsgFZ9oOZGDyGaqTOlM526gQjSK9gzADCQ9Tzzm2JdzAkjzwpAjNlqljQU5ZQlZmbl-mdOSgijH0cyLl2GS-MZkPJvOtEjrOXxknXBy_DS5nPvTTsJz6JlvlpmRzXFZCJ5nPPjz-abLdP40-RMud06bfumxf7aEVG3weo__KqDCbzPU-WpKPs8F0chAv1jfnAue1 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+distributed+in-memory+key-value+store+system+on+heterogeneous+CPU%E2%80%93GPU+cluster&rft.jtitle=The+VLDB+journal&rft.au=Zhang%2C+Kai&rft.au=Wang%2C+Kaibo&rft.au=Yuan%2C+Yuan&rft.au=Guo%2C+Lei&rft.date=2017-10-01&rft.pub=Springer+Berlin+Heidelberg&rft.issn=1066-8888&rft.eissn=0949-877X&rft.volume=26&rft.issue=5&rft.spage=729&rft.epage=750&rft_id=info:doi/10.1007%2Fs00778-017-0479-0&rft.externalDocID=10_1007_s00778_017_0479_0 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-8888&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-8888&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-8888&client=summon |