Replica-aware data recovery performance improvement for Hadoop system with NVM

The non-volatile memory (NVM) is the promising device to store data and accelerate big data analysis due to its excellent I/O performance. However, we find that simply replacing hard disk drive (HDD) with NVM cannot bring the expected performance improvement. In this paper, we take the data recovery...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:CCF transactions on high performance computing (Online) Ročník 3; číslo 2; s. 144 - 156
Hlavní autoři: Li, Xin, Li, Huijie, Lu, Youyou, Zhao, Yanchao, Qin, Xiaolin
Médium: Journal Article
Jazyk:angličtina
Vydáno: Singapore Springer Singapore 01.06.2021
Springer Nature B.V
Témata:
ISSN:2524-4922, 2524-4930
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The non-volatile memory (NVM) is the promising device to store data and accelerate big data analysis due to its excellent I/O performance. However, we find that simply replacing hard disk drive (HDD) with NVM cannot bring the expected performance improvement. In this paper, we take the data recovery issue in Hadoop file system (HDFS) as a case study to investigate how to take advantage of the performance of NVM. We analyze the data recovery mechanism in HDFS and find that the configuration of replication tasks in the DataNode can affect the data recovery significantly. We conduct extensive analysis and experiments tuning the configuration and also get some interesting findings. With the new configuration, we increase the data recovery performance from 17 to 71%. We can also improve the execution performance of MapReduce jobs from 28 to 59% through optimized configuration. We also find that the sudden data recovery brings disordered network resource competition, which reduces the performance of MapReduce jobs. Hence, We present a priority-aware multi-stage data recovery method. This improves the performance by 32.5% in addition for the MapReduce jobs.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2524-4922
2524-4930
DOI:10.1007/s42514-021-00066-9