Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks

The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applicat...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of parallel and distributed computing Ročník 74; číslo 8; s. 2770 - 2779
Hlavní autori: Hua, Xiayu, Wu, Hao, Li, Zheng, Ren, Shangping
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Amsterdam Elsevier Inc 01.08.2014
Elsevier
Predmet:
ISSN:0743-7315, 1096-0848
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applications that have large I/O data sets. However, the performance of HDFS decreases dramatically when handling the operations of interaction-intensive files, i.e., files that have relatively small size but are frequently accessed. The paper analyzes the cause of throughput degradation issue when accessing interaction-intensive files and presents an enhanced HDFS architecture along with an associated storage allocation algorithm that overcomes the performance degradation problem. Experiments have shown that with the proposed architecture together with the associated storage allocation algorithm, the HDFS throughput for interaction-intensive files increases 300% on average with only a negligible performance decrease for large data set tasks. •Analyzed the performance degradation of HDFS caused by interaction-intensive tasks.•Designed a two-layer structure to improve the performance of handling I/O request.•Integrated caches to reduce the overhead of accessing interaction-intensive files.•Developed a PSO-based storage allocation algorithm to improve the I/O throughput.•Designed a set of experiments to evaluate the performance of the proposed methods.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2014.03.010