Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks

The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applicat...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of parallel and distributed computing Ročník 74; číslo 8; s. 2770 - 2779
Hlavní autoři: Hua, Xiayu, Wu, Hao, Li, Zheng, Ren, Shangping
Médium: Journal Article
Jazyk:angličtina
Vydáno: Amsterdam Elsevier Inc 01.08.2014
Elsevier
Témata:
ISSN:0743-7315, 1096-0848
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The Hadoop Distributed File System (HDFS) is designed to run on commodity hardware and can be used as a stand-alone general purpose distributed file system (Hdfs user guide, 2008). It provides the ability to access bulk data with high I/O throughput. As a result, this system is suitable for applications that have large I/O data sets. However, the performance of HDFS decreases dramatically when handling the operations of interaction-intensive files, i.e., files that have relatively small size but are frequently accessed. The paper analyzes the cause of throughput degradation issue when accessing interaction-intensive files and presents an enhanced HDFS architecture along with an associated storage allocation algorithm that overcomes the performance degradation problem. Experiments have shown that with the proposed architecture together with the associated storage allocation algorithm, the HDFS throughput for interaction-intensive files increases 300% on average with only a negligible performance decrease for large data set tasks. •Analyzed the performance degradation of HDFS caused by interaction-intensive tasks.•Designed a two-layer structure to improve the performance of handling I/O request.•Integrated caches to reduce the overhead of accessing interaction-intensive files.•Developed a PSO-based storage allocation algorithm to improve the I/O throughput.•Designed a set of experiments to evaluate the performance of the proposed methods.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2014.03.010