Map-Balance-Reduce: An improved parallel programming model for load balancing of MapReduce

With the advent of the era of big data, the demand of massive data processing applications is also growing. Currently, MapReduce is the most commonly used data processing programming model. However, in some data processing cases, it has some defects. MapReduce programming based on key/value pairs, m...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Future generation computer systems Ročník 105; s. 993 - 1001
Hlavní autoři: Li, Jianjiang, Liu, Yajun, Pan, Jian, Zhang, Peng, Chen, Wei, Wang, Lizhe
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.04.2020
Témata:
ISSN:0167-739X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:With the advent of the era of big data, the demand of massive data processing applications is also growing. Currently, MapReduce is the most commonly used data processing programming model. However, in some data processing cases, it has some defects. MapReduce programming based on key/value pairs, matches the output of the Map tasks that will be transported to Reduce task nodes. The data with same key can only be processed by a Reduce node. If the data corresponding to a particular key or several keys accounts for most of all data, the Reduce node task will generate unbalanced load. In view of this defect, this paper proposes a new parallel programming model—Map-Balance-Reduce (MBR) programming model. It runs on our improved Hadoop framework and can effectively process the special data with unbalanced keys. This paper is based on two different scheduling, the processing and self-adaption scheduling. These two scheduling are designed to achieve MBR programming model. The actual testing results show that compared with MapReduce programming model, the MBR programming model under Hadoop can achieve the improvement of 9.7% to 17.6% in efficiency when testing data distributes unevenly. Furthermore, when testing conventional even-distributed data, it will only bring 1.02% time cost. •A new programming of “Map-Balance-Reduce” is proposed.•New scheduling algorithms of preprocessing scheduling and self-adaption scheduling are designed.•The proposed work can efficiently process unevenly distributed data for MapReduce.
ISSN:0167-739X
DOI:10.1016/j.future.2017.03.013