Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks

The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applicat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of parallel and distributed computing Jg. 74; H. 8; S. 2770 - 2779
Hauptverfasser: Hua, Xiayu, Wu, Hao, Li, Zheng, Ren, Shangping
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Amsterdam Elsevier Inc 01.08.2014
Elsevier
Schlagworte:
ISSN:0743-7315, 1096-0848
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applications that have large I/O data sets. However, the performance of HDFS decreases dramatically when handling the operations of interaction-intensive files, i.e., files that have relatively small size but are frequently accessed. The paper analyzes the cause of throughput degradation issue when accessing interaction-intensive files and presents an enhanced HDFS architecture along with an associated storage allocation algorithm that overcomes the performance degradation problem. Experiments have shown that with the proposed architecture together with the associated storage allocation algorithm, the HDFS throughput for interaction-intensive files increases 300% on average with only a negligible performance decrease for large data set tasks. •Analyzed the performance degradation of HDFS caused by interaction-intensive tasks.•Designed a two-layer structure to improve the performance of handling I/O request.•Integrated caches to reduce the overhead of accessing interaction-intensive files.•Developed a PSO-based storage allocation algorithm to improve the I/O throughput.•Designed a set of experiments to evaluate the performance of the proposed methods.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2014.03.010