A distributed in-memory key-value store system on heterogeneous CPU–GPU cluster

In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data-intensive operations demanding high memory bandwidth for fast data accesses, (...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The VLDB journal Jg. 26; H. 5; S. 729 - 750
Hauptverfasser:	Zhang, Kai, Wang, Kaibo, Yuan, Yuan, Guo, Lei, Li, Rubao, Zhang, Xiaodong, He, Bingsheng, Hu, Jiayu, Hua, Bei
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Berlin/Heidelberg Springer Berlin Heidelberg 01.10.2017 Springer Nature B.V
Schlagworte:	Bandwidths Central processing units Clusters Computation Computer memory Computer Science Computer upgrading CPUs Data processing Data storage Database Management Distributed memory Regular Paper Universal Serial Bus Heterogeneous systems Energy efficiency Distributed systems GPU Key-value store
ISSN:	1066-8888, 0949-877X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data-intensive operations demanding high memory bandwidth for fast data accesses, (2) high data parallelism and simple computing operations demanding many slim parallel computing units, and (3) a large working set. However, our experiments show that homogeneous multicore CPU systems are increasingly mismatched to the special properties of key-value stores because they do not provide massive data parallelism and high memory bandwidth; the powerful but the limited number of computing cores does not satisfy the demand of the unique data processing task; and the cache hierarchy may not well benefit to the large working set. In this paper, we present the design and implementation of Mega-KV, a distributed in-memory key-value store system on a heterogeneous CPU–GPU cluster. Effectively utilizing the high memory bandwidth and latency hiding capability of GPUs, Mega-KV provides fast data accesses and significantly boosts overall performance and energy efficiency over the homogeneous CPU architectures. Mega-KV shows excellent scalability and processes up to 623-million key-value operations per second on a cluster installed with eight CPUs and eight GPUs, while delivering an efficiency of up to 299-thousand operations per Watt (KOPS/W).
AbstractList	In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory key-value stores have several unique properties that include (1) data-intensive operations demanding high memory bandwidth for fast data accesses, (2) high data parallelism and simple computing operations demanding many slim parallel computing units, and (3) a large working set. However, our experiments show that homogeneous multicore CPU systems are increasingly mismatched to the special properties of key-value stores because they do not provide massive data parallelism and high memory bandwidth; the powerful but the limited number of computing cores does not satisfy the demand of the unique data processing task; and the cache hierarchy may not well benefit to the large working set. In this paper, we present the design and implementation of Mega-KV, a distributed in-memory key-value store system on a heterogeneous CPU–GPU cluster. Effectively utilizing the high memory bandwidth and latency hiding capability of GPUs, Mega-KV provides fast data accesses and significantly boosts overall performance and energy efficiency over the homogeneous CPU architectures. Mega-KV shows excellent scalability and processes up to 623-million key-value operations per second on a cluster installed with eight CPUs and eight GPUs, while delivering an efficiency of up to 299-thousand operations per Watt (KOPS/W).
Author	Wang, Kaibo Hua, Bei Zhang, Xiaodong Hu, Jiayu Yuan, Yuan Guo, Lei He, Bingsheng Zhang, Kai Li, Rubao
Author_xml	– sequence: 1 givenname: Kai orcidid: 0000-0001-7518-5466 surname: Zhang fullname: Zhang, Kai email: kay21s@gmail.com organization: Fudan University – sequence: 2 givenname: Kaibo surname: Wang fullname: Wang, Kaibo organization: Google Inc – sequence: 3 givenname: Yuan surname: Yuan fullname: Yuan, Yuan organization: The Ohio State University – sequence: 4 givenname: Lei surname: Guo fullname: Guo, Lei organization: Google Inc – sequence: 5 givenname: Rubao surname: Li fullname: Li, Rubao organization: The Ohio State University – sequence: 6 givenname: Xiaodong surname: Zhang fullname: Zhang, Xiaodong organization: The Ohio State University – sequence: 7 givenname: Bingsheng surname: He fullname: He, Bingsheng organization: National University of Singapore – sequence: 8 givenname: Jiayu surname: Hu fullname: Hu, Jiayu organization: University of Science and Technology of China – sequence: 9 givenname: Bei surname: Hua fullname: Hua, Bei organization: University of Science and Technology of China
BookMark	eNp9kM9KAzEQxoNUsK0-gLeA52iy_5I9lqJVKFjBgreQ3Z2tW7dJTbLC3nwH39AnMWU9iKBz-AaG7zczfBM00kYDQueMXjJK-ZULwgWhjBOa8JzQIzSmeZITwfnTCI0ZzTIiQp2giXNbSmkURekYPcxw1Thvm6LzUOFGkx3sjO3xC_TkTbUdYOeNDdo7DztsNH4GD9ZsQIPpHJ6v1p_vH4vVGpdtFyz2FB3XqnVw9t2naH1z_Ti_Jcv7xd18tiRlzDJPskiVjCkukrJSoFQt6kQlaRqmnKaCQlHFUGdpldZCCQ5RURY5zzIQUZrkeRVP0cWwd2_NawfOy63prA4nJctjkWc0Tnhw8cFVWuOchVqWjVe-Mdpb1bSSUXnITw75yZCfPOQnaSDZL3Jvm52y_b9MNDAuePUG7I-f_oS-ALeUhiQ
CitedBy_id	crossref_primary_10_1145_3538225 crossref_primary_10_1016_j_eswa_2024_123570
Cites_doi	10.1145/2508148.2485964 10.14778/2536360.2536370 10.1145/1629575.1629577 10.14778/2732967.2732976 10.1145/1807128.1807152 10.1145/2377677.2377681 10.1145/1816038.1815998 10.1145/2508148.2485926 10.1109/IGCC.2011.6008565 10.1145/191839.191886 10.1016/j.jalgor.2003.12.002 10.1109/ISPASS.2012.6189209 10.14778/2536206.2536210 10.1109/L-CA.2013.17 10.14778/1952376.1952381 10.1109/ICDE.2017.120 10.1109/PACT.2011.17 10.1145/1851275.1851207 10.1145/2882903.2915224 10.1016/B978-012722442-8/50043-4 10.1145/1376616.1376670 10.1145/1345206.1345220 10.1109/ICDE.2014.6816677 10.1145/2435264.2435306 10.1145/2391229.2391238 10.1109/MICRO.2012.19 10.1145/2145816.2145874 10.14778/2809974.2809984 10.1145/2236584.2236592 10.1145/2168836.2168855 10.14778/2850583.2850585 10.1145/2741948.2741956 10.1145/2254756.2254766 10.1145/2541940.2541951 10.1007/s00450-015-0300-5 10.1145/2806777.2806836 10.1145/2591971.2592002 10.1145/258533.258660 10.1109/MM.2014.41 10.1109/ICPP.2012.31 10.1145/1713254.1713276 10.1145/2517349.2522713 10.1145/2749469.2750416
ContentType	Journal Article
Copyright	Springer-Verlag GmbH Germany 2017 Copyright Springer Science & Business Media 2017
Copyright_xml	– notice: Springer-Verlag GmbH Germany 2017 – notice: Copyright Springer Science & Business Media 2017
DBID	AAYXX CITATION JQ2
DOI	10.1007/s00778-017-0479-0
DatabaseName	CrossRef ProQuest Computer Science Collection
DatabaseTitle	CrossRef ProQuest Computer Science Collection
DatabaseTitleList	ProQuest Computer Science Collection
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	0949-877X
EndPage	750
ExternalDocumentID	10_1007_s00778_017_0479_0
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 123 1N0 1SB 2.D 203 29R 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 3-Y 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AAKMM AALFJ AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAWTV AAYFX AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACM ACMDZ ACMLO ACOKC ACOMO ACPIV ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADL ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEBYY AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AENSD AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWIH AFWTZ AFWXC AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BBWZM BDATZ BGNMA BSONS CAG CCLIF COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EDO EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GUFHI GXS H13 HF~ HG5 HG6 HGAVV HMJXF HQYDN HRMNR HVGLF HZ~ I07 I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAS LHSKQ LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P0- P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VXZ W23 W48 W7O WK8 YLTOR YZZ Z45 Z7R Z7X Z83 Z88 Z8M Z8R Z8W Z92 ZMTXR ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG AEFXT AEJOY AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP AKRVB ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- PHGZM PHGZT PQGLB JQ2
ID	FETCH-LOGICAL-c316t-62ac11a784cdaeaaf8f4a455ac170580ebd3ef65d5f8a87e2bcb9766e825499d3
IEDL.DBID	RSV
ISICitedReferencesCount	5
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000410771700006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1066-8888
IngestDate	Thu Sep 25 00:52:05 EDT 2025 Sat Nov 29 03:17:17 EST 2025 Tue Nov 18 20:53:23 EST 2025 Fri Feb 21 02:37:43 EST 2025
IsPeerReviewed	false
IsScholarly	true
Issue	5
Keywords	Heterogeneous systems Energy efficiency Distributed systems GPU Key-value store
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c316t-62ac11a784cdaeaaf8f4a455ac170580ebd3ef65d5f8a87e2bcb9766e825499d3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-7518-5466
PQID	1938960347
PQPubID	2043708
PageCount	22
ParticipantIDs	proquest_journals_1938960347 crossref_citationtrail_10_1007_s00778_017_0479_0 crossref_primary_10_1007_s00778_017_0479_0 springer_journals_10_1007_s00778_017_0479_0
PublicationCentury	2000
PublicationDate	2017-10-01
PublicationDateYYYYMMDD	2017-10-01
PublicationDate_xml	– month: 10 year: 2017 text: 2017-10-01 day: 01
PublicationDecade	2010
PublicationPlace	Berlin/Heidelberg
PublicationPlace_xml	– name: Berlin/Heidelberg – name: New York
PublicationSubtitle	The International Journal on Very Large Data Bases
PublicationTitle	The VLDB journal
PublicationTitleAbbrev	The VLDB Journal
PublicationYear	2017
Publisher	Springer Berlin Heidelberg Springer Nature B.V
Publisher_xml	– name: Springer Berlin Heidelberg – name: Springer Nature B.V
References	Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: NSDI, pp. 385–398 (2013) Zhang, K., Hu, J., He, B., Hua, B.: Dido: Dynamic pipelines for in-memory key-value stores on coupled CPU-GPU architectures. In: ICDE, pp. 671–682 (2017) Mitchell, C., Geng, Y., Li, J.: Using one-sided rdma reads to build a fast, CPU-efficient key-value store. In: USENIX ATC, pp. 103–114 (2013) Jeong, E.Y., Woo, S., Jamshed, M., Jeong, H., Ihm, S., Han, D., Park, K.: mTCP: A highly scalable user-level tcp stack for multicore systems. In: NSDI (2014) Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. In: PVLDB, pp. 709–720 (2013) Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: STOC, pp. 654–663 (1997) Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: EuroSys, pp. 183–196 (2012) Paul, J., He, J., He, B.: GPL: A GPU-based pipelined query processing engine. In: SIGMOD, pp. 1935–1950 (2016) Wang, K., Ding, X., Lee, R., Kato, S., Zhang, X.: Gdm: Device memory management for GPGPU computing. In: SIGMETRICS, pp. 533–545 (2014) Li, C., Cox, A.L.: Gd-wheel: A cost-aware replacement policy for key-value stores. In: Proceedings of the Tenth European Conference on Computer Systems, EuroSys (2015) Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012) Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: IGCC, pp. 1–8 (2011) Geambasu, R., Levy, A.A., Kohno, T., Krishnamurthy, A., Levy, H.M.: Comet: an active distributed key-value store. In: OSDI (2010) Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: SOSP (2013) Pirk, H., Manegold, S., Kersten, M.: Waste not... efficient co-processing of relational data. In: ICDE, pp. 508–519 (2014) Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC, pp. 143–154 (2010) Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A., Ailamaki, A., Falsafi, B.: A case for specialized processors for scale-out workloads. In: Micro, pp. 31–42 (2014) Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. In: VLDB, pp. 96–107 (2015) Han, S., Jang, K., Park, K., Moon, S.: Packetshader: A GPU-accelerated software router. In: SIGCOMM, pp. 195–206 (2010) HongSKimHAn integrated GPU power and performance modelACM SIGARCH Comput. Archit. News201038328028910.1145/1816038.1815998 PriceDCClarkMABarsdellBRBabichRGreenhillLJOptimizing performance-per-watt on GPUs in high performance computingComput. Sci. Res. Dev.201631418519310.1007/s00450-015-0300-5 Ma, K., Li, X., Chen W., Zhang, C., Wang, X.: Green GPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: 2012 41st International Conference on Parallel Processing. Pittsburgh, PA, pp. 48–57 (2012) Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: DaMoN, pp. 55–62 (2012) Leng, T., Ali, R., Hsieh, J., Mashayekhi, V., Rooholamini, R.: An empirical study of hyper-threading in high performance computing clusters. Linux HPC Revolution (2002) Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: Fawn: a fast array of wimpy nodes. In: SOSP, pp. 1–14 (2009) PaghRRodlerFFCuckoo hashingJ. Algorithms2003512122144205014010.1016/j.jalgor.2003.12.0021091.68036 Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: SIGMOD, pp. 243–252 (1994) Lee, J., Sathisha, V., Schulte, M., Compton, K., Kim, N. S.: Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling. In: 2011 International Conference on Parallel Architectures and Compilation Techniques. Galveston, TX, pp. 111–120 (2011) Nvidia management library. https://developer.nvidia.com/nvidia-management-library-nvml LimKMeisnerDSaidiAGRanganathanPWenischTFThin servers with smart pipes: designing soc accelerators for memcachedSIGARCH Comput. Archit. News201341364710.1145/2508148.2485926 CPU Frequency Scaling. https://wiki.archlinux.org/index.php/CPU_frequency_scaling ZhangKWangKYuanYGuoLLeeRZhangXMega-kv: a case for GPUs to maximize the throughput of in-memory key-value storesProc. VLDB Endow.201581226123710.14778/2809974.2809984 Fan, B., Andersen, D.G., Kaminsky, M.: Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. In: NSDI, pp. 371–384 (2013) Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB, pp. 405–416 (2003) Erlingsson, Ú., Manasse, M., McSherry, F.: A cool and practical alternative to traditional hash tables. In: WDAS, pp. 1–6 (2006) Hetherington, T.H., O’Connor, M., Aamodt, T.M.: Memcachedgpu: Scaling-up scale-out key-value stores. In: SoCC, pp. 43–57 (2015) Li, S., Lim, H., Lee, V.W., Ahn, J.H., Kalia, A., Kaminsky, M., Andersen, D.G., Seongil, O., Lee, S., Dubey, P.: Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In: ISCA, pp. 476–488 (2015) ZhangKChenFDingXHuaiYLeeRLuoTWangKYuanYZhangXHetero-db: next generation high-performance database systems by best utilizing heterogeneous computing and storage resourcesJCST201530657678 Hetherington, T., Rogers, T., Hsu, L., O’Connor, M., Aamodt, T.: Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In: ISPASS, pp. 88–98 (2012) He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: SIGMOD, pp. 511–524 (2008) Gutierrez, A., Cieslak, M., Giridhar, B., Dreslinski, R.G., Ceze, L., Mudge, T.: Integrated 3d-stacked server designs for increasing physical density of key-value stores. In: ASPLOS, pp. 485–498 (2014) He, B., Yu, J.X.: High-throughput transaction executions on graphics processors. In: PVLDB (2011) EscrivaRWongBSirerEGHyperdex: a distributed, searchable key-value storeACM SIGCOMM Comput. Commun. Rev.2012424253610.1145/2377677.2377681 Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with GPUs. In: PVLDB, pp. 1011–1022 (2014) Intel dpdk. http://dpdk.org Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using cuda. In: PPoPP, pp. 73–82 (2008) Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: A cache-partitioned hash table. In: PPoPP, pp. 319–320 (2012) ZhangKHuJHuaBA holistic approach to build real-time stream processing system with GPUJPDC201583C4457 Chalamalasetti, S.R., Lim, K., Wright, M., AuYoung, A., Ranganathan, P., Margala, M.: An FPGA memcached appliance. In: FPGA, pp. 245–254 (2013) CormenTHLeisersonCERivestRLSteinCIntroduction to Algorithm20093CambridgeThe MIT Press1187.68679 LengJHetheringtonTEltantawyAGilaniSKimNSAamodtTMReddiVJGPUWattch: enabling energy optimizations in GPGPUsACM SIGARCH Comput. Archit. News201341348749810.1145/2508148.2485964 Lim, H., Han, D., Andersen, D.G., Kaminsky, M.: Mica: A holistic approach to fast in-memory key-value storage. In: NSDI, pp. 429–444 (2014) Memcached. http://memcached.org Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: SIGMETRICS, pp. 53–64 (2012) Kapoor, R., Porter, G., Tewari, M., Voelker, G.M., Vahdat, A.: Chronos: predictable low latency for data center applications. In: SoCC, pp. 9:1–9:14 (2012) OusterhoutJAgrawalPEricksonDKozyrakisCLeverichJMazièresDMitraSNarayananAParulkarGRosenblumMRumbleSMStratmannEStutsmanRThe case for ramclouds: scalable high-performance storage entirely in dramSIGOPS Oper. Syst. Rev.2010439210510.1145/1713254.1713276 Redis. http://redis.io LavasaniMAngepatHChiouDAn fpga-based in-line accelerator for memcachedComput. Archit. Lett.2013132576010.1109/L-CA.2013.17 Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. In: PVLDB, pp. 817–828 (2013) 479_CR50 K Zhang (479_CR55) 2015; 30 479_CR53 K Zhang (479_CR58) 2015; 8 479_CR10 479_CR54 479_CR51 479_CR52 479_CR14 479_CR12 479_CR56 479_CR17 479_CR18 479_CR15 479_CR59 479_CR16 479_CR6 479_CR7 479_CR4 479_CR5 479_CR2 479_CR3 479_CR1 TH Cormen (479_CR11) 2009 479_CR20 479_CR21 479_CR24 479_CR22 479_CR23 479_CR28 479_CR29 479_CR26 K Zhang (479_CR57) 2015; 83 479_CR27 479_CR19 479_CR31 479_CR35 479_CR36 479_CR33 479_CR34 479_CR39 K Lim (479_CR37) 2013; 41 479_CR38 J Leng (479_CR32) 2013; 41 S Hong (479_CR25) 2010; 38 R Pagh (479_CR44) 2003; 51 479_CR42 479_CR40 479_CR41 479_CR46 479_CR45 R Escriva (479_CR13) 2012; 42 479_CR8 J Ousterhout (479_CR43) 2010; 43 479_CR48 479_CR9 479_CR49 M Lavasani (479_CR30) 2013; 13 DC Price (479_CR47) 2016; 31
References_xml	– reference: Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with GPUs. In: PVLDB, pp. 1011–1022 (2014) – reference: Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. In: PVLDB, pp. 817–828 (2013) – reference: PaghRRodlerFFCuckoo hashingJ. Algorithms2003512122144205014010.1016/j.jalgor.2003.12.0021091.68036 – reference: Memcached. http://memcached.org/ – reference: Han, S., Jang, K., Park, K., Moon, S.: Packetshader: A GPU-accelerated software router. In: SIGCOMM, pp. 195–206 (2010) – reference: Lee, J., Sathisha, V., Schulte, M., Compton, K., Kim, N. S.: Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling. In: 2011 International Conference on Parallel Architectures and Compilation Techniques. Galveston, TX, pp. 111–120 (2011) – reference: Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: DaMoN, pp. 55–62 (2012) – reference: Zhang, K., Hu, J., He, B., Hua, B.: Dido: Dynamic pipelines for in-memory key-value stores on coupled CPU-GPU architectures. In: ICDE, pp. 671–682 (2017) – reference: ZhangKHuJHuaBA holistic approach to build real-time stream processing system with GPUJPDC201583C4457 – reference: He, B., Yu, J.X.: High-throughput transaction executions on graphics processors. In: PVLDB (2011) – reference: Jeong, E.Y., Woo, S., Jamshed, M., Jeong, H., Ihm, S., Han, D., Park, K.: mTCP: A highly scalable user-level tcp stack for multicore systems. In: NSDI (2014) – reference: ZhangKChenFDingXHuaiYLeeRLuoTWangKYuanYZhangXHetero-db: next generation high-performance database systems by best utilizing heterogeneous computing and storage resourcesJCST201530657678 – reference: Li, C., Cox, A.L.: Gd-wheel: A cost-aware replacement policy for key-value stores. In: Proceedings of the Tenth European Conference on Computer Systems, EuroSys (2015) – reference: Li, S., Lim, H., Lee, V.W., Ahn, J.H., Kalia, A., Kaminsky, M., Andersen, D.G., Seongil, O., Lee, S., Dubey, P.: Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In: ISCA, pp. 476–488 (2015) – reference: CPU Frequency Scaling. https://wiki.archlinux.org/index.php/CPU_frequency_scaling/ – reference: Hetherington, T., Rogers, T., Hsu, L., O’Connor, M., Aamodt, T.: Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In: ISPASS, pp. 88–98 (2012) – reference: Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: IGCC, pp. 1–8 (2011) – reference: OusterhoutJAgrawalPEricksonDKozyrakisCLeverichJMazièresDMitraSNarayananAParulkarGRosenblumMRumbleSMStratmannEStutsmanRThe case for ramclouds: scalable high-performance storage entirely in dramSIGOPS Oper. Syst. Rev.2010439210510.1145/1713254.1713276 – reference: Erlingsson, Ú., Manasse, M., McSherry, F.: A cool and practical alternative to traditional hash tables. In: WDAS, pp. 1–6 (2006) – reference: LavasaniMAngepatHChiouDAn fpga-based in-line accelerator for memcachedComput. Archit. Lett.2013132576010.1109/L-CA.2013.17 – reference: Fan, B., Andersen, D.G., Kaminsky, M.: Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. In: NSDI, pp. 371–384 (2013) – reference: Hetherington, T.H., O’Connor, M., Aamodt, T.M.: Memcachedgpu: Scaling-up scale-out key-value stores. In: SoCC, pp. 43–57 (2015) – reference: Paul, J., He, J., He, B.: GPL: A GPU-based pipelined query processing engine. In: SIGMOD, pp. 1935–1950 (2016) – reference: Nvidia management library. https://developer.nvidia.com/nvidia-management-library-nvml/ – reference: Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: SIGMETRICS, pp. 53–64 (2012) – reference: ZhangKWangKYuanYGuoLLeeRZhangXMega-kv: a case for GPUs to maximize the throughput of in-memory key-value storesProc. VLDB Endow.201581226123710.14778/2809974.2809984 – reference: Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: STOC, pp. 654–663 (1997) – reference: LengJHetheringtonTEltantawyAGilaniSKimNSAamodtTMReddiVJGPUWattch: enabling energy optimizations in GPGPUsACM SIGARCH Comput. Archit. News201341348749810.1145/2508148.2485964 – reference: Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012) – reference: CormenTHLeisersonCERivestRLSteinCIntroduction to Algorithm20093CambridgeThe MIT Press1187.68679 – reference: Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: NSDI, pp. 385–398 (2013) – reference: LimKMeisnerDSaidiAGRanganathanPWenischTFThin servers with smart pipes: designing soc accelerators for memcachedSIGARCH Comput. Archit. News201341364710.1145/2508148.2485926 – reference: Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. In: VLDB, pp. 96–107 (2015) – reference: Intel dpdk. http://dpdk.org/ – reference: EscrivaRWongBSirerEGHyperdex: a distributed, searchable key-value storeACM SIGCOMM Comput. Commun. Rev.2012424253610.1145/2377677.2377681 – reference: Wang, K., Ding, X., Lee, R., Kato, S., Zhang, X.: Gdm: Device memory management for GPGPU computing. In: SIGMETRICS, pp. 533–545 (2014) – reference: Pirk, H., Manegold, S., Kersten, M.: Waste not... efficient co-processing of relational data. In: ICDE, pp. 508–519 (2014) – reference: Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: EuroSys, pp. 183–196 (2012) – reference: Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB, pp. 405–416 (2003) – reference: Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. In: PVLDB, pp. 709–720 (2013) – reference: Kapoor, R., Porter, G., Tewari, M., Voelker, G.M., Vahdat, A.: Chronos: predictable low latency for data center applications. In: SoCC, pp. 9:1–9:14 (2012) – reference: Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using cuda. In: PPoPP, pp. 73–82 (2008) – reference: Leng, T., Ali, R., Hsieh, J., Mashayekhi, V., Rooholamini, R.: An empirical study of hyper-threading in high performance computing clusters. Linux HPC Revolution (2002) – reference: Gutierrez, A., Cieslak, M., Giridhar, B., Dreslinski, R.G., Ceze, L., Mudge, T.: Integrated 3d-stacked server designs for increasing physical density of key-value stores. In: ASPLOS, pp. 485–498 (2014) – reference: Chalamalasetti, S.R., Lim, K., Wright, M., AuYoung, A., Ranganathan, P., Margala, M.: An FPGA memcached appliance. In: FPGA, pp. 245–254 (2013) – reference: Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: A cache-partitioned hash table. In: PPoPP, pp. 319–320 (2012) – reference: Ma, K., Li, X., Chen W., Zhang, C., Wang, X.: Green GPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: 2012 41st International Conference on Parallel Processing. Pittsburgh, PA, pp. 48–57 (2012) – reference: Redis. http://redis.io/ – reference: Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: Fawn: a fast array of wimpy nodes. In: SOSP, pp. 1–14 (2009) – reference: Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A., Ailamaki, A., Falsafi, B.: A case for specialized processors for scale-out workloads. In: Micro, pp. 31–42 (2014) – reference: Geambasu, R., Levy, A.A., Kohno, T., Krishnamurthy, A., Levy, H.M.: Comet: an active distributed key-value store. In: OSDI (2010) – reference: Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC, pp. 143–154 (2010) – reference: Lim, H., Han, D., Andersen, D.G., Kaminsky, M.: Mica: A holistic approach to fast in-memory key-value storage. In: NSDI, pp. 429–444 (2014) – reference: Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: SIGMOD, pp. 243–252 (1994) – reference: HongSKimHAn integrated GPU power and performance modelACM SIGARCH Comput. Archit. News201038328028910.1145/1816038.1815998 – reference: Mitchell, C., Geng, Y., Li, J.: Using one-sided rdma reads to build a fast, CPU-efficient key-value store. In: USENIX ATC, pp. 103–114 (2013) – reference: He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: SIGMOD, pp. 511–524 (2008) – reference: PriceDCClarkMABarsdellBRBabichRGreenhillLJOptimizing performance-per-watt on GPUs in high performance computingComput. Sci. Res. Dev.201631418519310.1007/s00450-015-0300-5 – reference: Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: SOSP (2013) – volume: 41 start-page: 487 issue: 3 year: 2013 ident: 479_CR32 publication-title: ACM SIGARCH Comput. Archit. News doi: 10.1145/2508148.2485964 – ident: 479_CR22 doi: 10.14778/2536360.2536370 – ident: 479_CR6 doi: 10.1145/1629575.1629577 – volume-title: Introduction to Algorithm year: 2009 ident: 479_CR11 – ident: 479_CR52 doi: 10.14778/2732967.2732976 – ident: 479_CR10 doi: 10.1145/1807128.1807152 – volume: 42 start-page: 25 issue: 4 year: 2012 ident: 479_CR13 publication-title: ACM SIGCOMM Comput. Commun. Rev. doi: 10.1145/2377677.2377681 – volume: 38 start-page: 280 issue: 3 year: 2010 ident: 479_CR25 publication-title: ACM SIGARCH Comput. Archit. News doi: 10.1145/1816038.1815998 – volume: 41 start-page: 36 year: 2013 ident: 479_CR37 publication-title: SIGARCH Comput. Archit. News doi: 10.1145/2508148.2485926 – ident: 479_CR8 doi: 10.1109/IGCC.2011.6008565 – ident: 479_CR17 doi: 10.1145/191839.191886 – volume: 51 start-page: 122 issue: 2 year: 2003 ident: 479_CR44 publication-title: J. Algorithms doi: 10.1016/j.jalgor.2003.12.002 – ident: 479_CR23 doi: 10.1109/ISPASS.2012.6189209 – ident: 479_CR54 doi: 10.14778/2536206.2536210 – volume: 13 start-page: 57 issue: 2 year: 2013 ident: 479_CR30 publication-title: Comput. Archit. Lett. doi: 10.1109/L-CA.2013.17 – ident: 479_CR3 – ident: 479_CR21 doi: 10.14778/1952376.1952381 – ident: 479_CR56 doi: 10.1109/ICDE.2017.120 – ident: 479_CR41 – ident: 479_CR31 doi: 10.1109/PACT.2011.17 – ident: 479_CR19 doi: 10.1145/1851275.1851207 – ident: 479_CR45 doi: 10.1145/2882903.2915224 – ident: 479_CR59 doi: 10.1016/B978-012722442-8/50043-4 – ident: 479_CR20 doi: 10.1145/1376616.1376670 – ident: 479_CR49 doi: 10.1145/1345206.1345220 – ident: 479_CR2 – ident: 479_CR14 – ident: 479_CR42 – ident: 479_CR46 doi: 10.1109/ICDE.2014.6816677 – ident: 479_CR9 doi: 10.1145/2435264.2435306 – ident: 479_CR28 doi: 10.1145/2391229.2391238 – ident: 479_CR53 doi: 10.1109/MICRO.2012.19 – volume: 30 start-page: 657 year: 2015 ident: 479_CR55 publication-title: JCST – ident: 479_CR40 doi: 10.1145/2145816.2145874 – ident: 479_CR5 – volume: 83 start-page: 44 issue: C year: 2015 ident: 479_CR57 publication-title: JPDC – volume: 8 start-page: 1226 year: 2015 ident: 479_CR58 publication-title: Proc. VLDB Endow. doi: 10.14778/2809974.2809984 – ident: 479_CR1 – ident: 479_CR36 – ident: 479_CR27 doi: 10.1145/2236584.2236592 – ident: 479_CR39 doi: 10.1145/2168836.2168855 – ident: 479_CR48 doi: 10.14778/2850583.2850585 – ident: 479_CR34 doi: 10.1145/2741948.2741956 – ident: 479_CR7 doi: 10.1145/2254756.2254766 – ident: 479_CR18 doi: 10.1145/2541940.2541951 – volume: 31 start-page: 185 issue: 4 year: 2016 ident: 479_CR47 publication-title: Comput. Sci. Res. Dev. doi: 10.1007/s00450-015-0300-5 – ident: 479_CR24 doi: 10.1145/2806777.2806836 – ident: 479_CR51 doi: 10.1145/2591971.2592002 – ident: 479_CR26 – ident: 479_CR29 doi: 10.1145/258533.258660 – ident: 479_CR15 doi: 10.1109/MM.2014.41 – ident: 479_CR38 doi: 10.1109/ICPP.2012.31 – volume: 43 start-page: 92 year: 2010 ident: 479_CR43 publication-title: SIGOPS Oper. Syst. Rev. doi: 10.1145/1713254.1713276 – ident: 479_CR33 – ident: 479_CR50 doi: 10.1145/2517349.2522713 – ident: 479_CR35 doi: 10.1145/2749469.2750416 – ident: 479_CR16 – ident: 479_CR4 – ident: 479_CR12
SSID	ssj0002225
Score	2.200599
Snippet	In-memory key-value stores play a critical role in many data-intensive applications to provide high-throughput and low latency data accesses. In-memory...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	729
SubjectTerms	Bandwidths Central processing units Clusters Computation Computer memory Computer Science Computer upgrading CPUs Data processing Data storage Database Management Distributed memory Regular Paper Universal Serial Bus
Title	A distributed in-memory key-value store system on heterogeneous CPU–GPU cluster
URI	https://link.springer.com/article/10.1007/s00778-017-0479-0 https://www.proquest.com/docview/1938960347
Volume	26
WOSCitedRecordID	wos000410771700006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 0949-877X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002225 issn: 1066-8888 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELUQcOBCWUWhIB84gSylWWz3WFUUDqgqS6veIm8RlUqCmhaJG__AH_IleNwkLAIkuOSQOFE09njmaZ7fIHTckpozQQMStiQnFn9pwpWm1uOZ8AXlkjmC7PCS9Xp8NGr1i3Pcecl2L0uSbqeuDruB8gwQrxgBWXRicfqKjXYM-jVc3wyr7RcAjCtxUkosvONlKfO7T3wORu8Z5peiqIs13dq__nIDrRepJW4v1sImWjLpFqqVbRtw4cXb6KqNNcjlQqcro_E4JfdAt33C1p8JiH8bDJRJe3UqzzhL8R2QZjK71kw2z3GnP3h9fjnvD7CazEFoYQcNume3nQtSdFYgKmjSGaG-UM2mYDxUWhghEp6EIowie5d5EfeM1IFJaKSjhAvOjC-VtHkLNQs8qYNdtJxmqdlDmEGrDovbAt8TofQUl5ESiQ2NxjXeVHXklSaOVSE7Dt0vJnElmOxMFluTxWCy2Kujk-qVh4Xmxm-DG-W8xYX75bHNSrmFZkHI6ui0nKcPj3_62P6fRh-gNR8m2lH7Gmh5Np2bQ7SqHmfjfHrkVuUbW_Xb7Q
linkProvider	Springer Nature
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3LSgMxFA1SBd1Yn1itmoUrJTCdR5IuS7FWrKViW7obMkkGC3VG-hDc-Q_-oV9ibjpTH6igm1nMJGG4yU3u4Z6ci9BJNVKcCeoRvxpxYvCXIlwqajyeCVdQHjFLkO23WLvNB4NqJ7vHPcnZ7nlK0u7Ui8tuoDwDxCtGQBadGJy-7IPgF0D02_5i-wUAY1OclBID73ieyvxuiM-H0XuE-SUpas-aRvFff7mB1rPQEtfma2ETLelkCxXzsg048-JtdFPDCuRyodKVVniYkHug2z5h488ExL81BsqkeVqVZ5wm-A5IM6lZazqdTXC903t9frno9LAczUBoYQf1GufdepNklRWI9Cp0SqgrZKUiGPelElqImMe-8IPAvGVOwB0dKU_HNFBBzAVn2o1kZOIWqud4Unm7qJCkid5DmEGpDoPbPNcRfuRIHgVSxOZo1LbwpiwhJzdxKDPZcah-MQoXgsnWZKExWQgmC50SOl10eZhrbvzWuJzPW5i53yQ0USk30MzzWQmd5fP04fNPg-3_qfUxWm12r1th67J9dYDWXJh0S_Mro8J0PNOHaEU-ToeT8ZFdoW-0CN7R
linkToPdf	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA5SRbxYn1itmoMnJXS7jyQ9lmpVLKU-WnpbskkWC3Vb-hC8-R_8h_4SM_vygQriZQ-72bDMZDYzzJfvQ-ioFijOBHWIWws4MfWXIlwqaiKeCVtQHrAYINtrsXab9_u1TqpzOs3Q7llLMjnTACxN0awyVmElP_gGLDQAwmIEKNKJqdkXXTM9YPpubnv5rxiKmbjdSSkxpR7P2prfTfF5Y3rPNr80SON9p1n89xevodU05cT1ZI2sowUdbaBiJueA0-jeRNd1rIBGFxSwtMKDiDwADPcJmzgnQAquMUApzTVmf8ajCN8DmGZk1qAezae40em-Pr-cd7pYDudAwLCFus2zu8YFSRUXiHSqdEaoLWS1Khh3pRJaiJCHrnA9z9xllsctHShHh9RTXsgFZ9oOZGDyGaqTOlM526gQjSK9gzADCQ9Tzzm2JdzAkjzwpAjNlqljQU5ZQlZmbl-mdOSgijH0cyLl2GS-MZkPJvOtEjrOXxknXBy_DS5nPvTTsJz6JlvlpmRzXFZCJ5nPPjz-abLdP40-RMud06bfumxf7aEVG3weo__KqDCbzPU-WpKPs8F0chAv1jfnAue1
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+distributed+in-memory+key-value+store+system+on+heterogeneous+CPU%E2%80%93GPU+cluster&rft.jtitle=The+VLDB+journal&rft.au=Zhang%2C+Kai&rft.au=Wang%2C+Kaibo&rft.au=Yuan%2C+Yuan&rft.au=Guo%2C+Lei&rft.date=2017-10-01&rft.pub=Springer+Berlin+Heidelberg&rft.issn=1066-8888&rft.eissn=0949-877X&rft.volume=26&rft.issue=5&rft.spage=729&rft.epage=750&rft_id=info:doi/10.1007%2Fs00778-017-0479-0&rft.externalDocID=10_1007_s00778_017_0479_0
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-8888&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-8888&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-8888&client=summon