Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks

The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applicat...

Full description

Saved in:
Bibliographic Details
Published in:Journal of parallel and distributed computing Vol. 74; no. 8; pp. 2770 - 2779
Main Authors: Hua, Xiayu, Wu, Hao, Li, Zheng, Ren, Shangping
Format: Journal Article
Language:English
Published: Amsterdam Elsevier Inc 01.08.2014
Elsevier
Subjects:
ISSN:0743-7315, 1096-0848
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applications that have large I/O data sets. However, the performance of HDFS decreases dramatically when handling the operations of interaction-intensive files, i.e., files that have relatively small size but are frequently accessed. The paper analyzes the cause of throughput degradation issue when accessing interaction-intensive files and presents an enhanced HDFS architecture along with an associated storage allocation algorithm that overcomes the performance degradation problem. Experiments have shown that with the proposed architecture together with the associated storage allocation algorithm, the HDFS throughput for interaction-intensive files increases 300% on average with only a negligible performance decrease for large data set tasks. •Analyzed the performance degradation of HDFS caused by interaction-intensive tasks.•Designed a two-layer structure to improve the performance of handling I/O request.•Integrated caches to reduce the overhead of accessing interaction-intensive files.•Developed a PSO-based storage allocation algorithm to improve the I/O throughput.•Designed a set of experiments to evaluate the performance of the proposed methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2014.03.010