Optimization of Small Sized File Access Efficiency in Hadoop Distributed File System by Integrating Virtual File System Layer

Storage for large datasets, handling data in different formats and data getting generated with high speed are the major highlights of the Hadoop because of which the Hadoop got invented. Hadoop is the solution for the big data problems as discussed above. In order to give the improved solution (in t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advanced computer science & applications Jg. 13; H. 6
Hauptverfasser: Alange, Neeta, Mathur, Anjali
Format: Journal Article
Sprache:Englisch
Veröffentlicht: West Yorkshire Science and Information (SAI) Organization Limited 01.01.2022
Schlagworte:
ISSN:2158-107X, 2156-5570
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Storage for large datasets, handling data in different formats and data getting generated with high speed are the major highlights of the Hadoop because of which the Hadoop got invented. Hadoop is the solution for the big data problems as discussed above. In order to give the improved solution (in terms of access efficiency and time) for small sized files, this solution is proposed. A novel approach called VFS-HDFS architecture is designed in which the focus is on optimization of small sized files access problems with significant development compared with the existing solutions i.e. HDFS sequence files, HAR, NHAR. In the proposed work a Virtual file system layer has been added as a wrapper over the top of existing HDFS architecture. However, the research work is carried out without altering the existing HFDS architecture. In this paper drawbacks of existing techniques i.e. Flat File Technique and Table Chain Technique which are implemented in HDFS HAR, NHAR, sequence file is overcome by using Bucket Chain Technique. The files to merge in a single bucket are selected using ensemble classifier which is a combination of different classifiers. Combination of multiple classifiers gives the better accurate results. Using this proposed system, better results are obtained compared with the existing system in terms of access efficiency of small sized files in HDFS.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2158-107X
2156-5570
DOI:10.14569/IJACSA.2022.0130626