Replica-aware data recovery performance improvement for Hadoop system with NVM

The non-volatile memory (NVM) is the promising device to store data and accelerate big data analysis due to its excellent I/O performance. However, we find that simply replacing hard disk drive (HDD) with NVM cannot bring the expected performance improvement. In this paper, we take the data recovery...

Full description

Saved in:
Bibliographic Details
Published in:CCF transactions on high performance computing (Online) Vol. 3; no. 2; pp. 144 - 156
Main Authors: Li, Xin, Li, Huijie, Lu, Youyou, Zhao, Yanchao, Qin, Xiaolin
Format: Journal Article
Language:English
Published: Singapore Springer Singapore 01.06.2021
Springer Nature B.V
Subjects:
ISSN:2524-4922, 2524-4930
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The non-volatile memory (NVM) is the promising device to store data and accelerate big data analysis due to its excellent I/O performance. However, we find that simply replacing hard disk drive (HDD) with NVM cannot bring the expected performance improvement. In this paper, we take the data recovery issue in Hadoop file system (HDFS) as a case study to investigate how to take advantage of the performance of NVM. We analyze the data recovery mechanism in HDFS and find that the configuration of replication tasks in the DataNode can affect the data recovery significantly. We conduct extensive analysis and experiments tuning the configuration and also get some interesting findings. With the new configuration, we increase the data recovery performance from 17 to 71%. We can also improve the execution performance of MapReduce jobs from 28 to 59% through optimized configuration. We also find that the sudden data recovery brings disordered network resource competition, which reduces the performance of MapReduce jobs. Hence, We present a priority-aware multi-stage data recovery method. This improves the performance by 32.5% in addition for the MapReduce jobs.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2524-4922
2524-4930
DOI:10.1007/s42514-021-00066-9