High Performance Hadoop Distributed File System

Although by the end of 2020, most of companies will be running 1000 node Hadoop in the system, the Hadoop implementation is still accompanied by many challenges like security, fault tolerance, flexibility. Hadoop is a software paradigm that handles big data, and it has a distributed file systems so-...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The International journal of networked and distributed computing (Online) Jg. 8; H. 3; S. 119 - 123
Hauptverfasser:	Elkawkagy, Mohamed, Elbeh, Heba
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Dordrecht Springer Netherlands 01.06.2020 Springer
Schlagworte:	Cloud fault tolerance HDFS reliability Research Article HDFS fault tolerance Cloud reliability
ISSN:	2211-7938, 2211-7946, 2211-7946
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Although by the end of 2020, most of companies will be running 1000 node Hadoop in the system, the Hadoop implementation is still accompanied by many challenges like security, fault tolerance, flexibility. Hadoop is a software paradigm that handles big data, and it has a distributed file systems so-called Hadoop Distributed File System (HDFS). HDFS has the ability to handle fault tolerance using data replication technique. It works by repeating the data in multiple DataNodes which means the reliability and availability are achieved. Although data replications technique works well, but still waste much more time because it uses single pipelined paradigm. The proposed approach improves the performance of HDFS by using multiple pipelines in transferring data blocks instead of single pipeline. In addition, each DataNode will update its reliability value after each round and send this updated data to the NameNode. The NameNode will sort the DataNodes according to the reliability value. When the client submits request to upload data block, the NameNode will reply by a list of high reliability DataNodes that will achieve high performance. The proposed approach is fully implemented and the experimental results show that it improves the performance of HDFS write operations.
ISSN:	2211-7938 2211-7946 2211-7946
DOI:	10.2991/ijndc.k.200515.007