Optimization of Small Sized File Access Efficiency in Hadoop Distributed File System by Integrating Virtual File System Layer

Storage for large datasets, handling data in different formats and data getting generated with high speed are the major highlights of the Hadoop because of which the Hadoop got invented. Hadoop is the solution for the big data problems as discussed above. In order to give the improved solution (in t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advanced computer science & applications Jg. 13; H. 6
Hauptverfasser: Alange, Neeta, Mathur, Anjali
Format: Journal Article
Sprache:Englisch
Veröffentlicht: West Yorkshire Science and Information (SAI) Organization Limited 01.01.2022
Schlagworte:
ISSN:2158-107X, 2156-5570
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Storage for large datasets, handling data in different formats and data getting generated with high speed are the major highlights of the Hadoop because of which the Hadoop got invented. Hadoop is the solution for the big data problems as discussed above. In order to give the improved solution (in terms of access efficiency and time) for small sized files, this solution is proposed. A novel approach called VFS-HDFS architecture is designed in which the focus is on optimization of small sized files access problems with significant development compared with the existing solutions i.e. HDFS sequence files, HAR, NHAR. In the proposed work a Virtual file system layer has been added as a wrapper over the top of existing HDFS architecture. However, the research work is carried out without altering the existing HFDS architecture. In this paper drawbacks of existing techniques i.e. Flat File Technique and Table Chain Technique which are implemented in HDFS HAR, NHAR, sequence file is overcome by using Bucket Chain Technique. The files to merge in a single bucket are selected using ensemble classifier which is a combination of different classifiers. Combination of multiple classifiers gives the better accurate results. Using this proposed system, better results are obtained compared with the existing system in terms of access efficiency of small sized files in HDFS.
AbstractList Storage for large datasets, handling data in different formats and data getting generated with high speed are the major highlights of the Hadoop because of which the Hadoop got invented. Hadoop is the solution for the big data problems as discussed above. In order to give the improved solution (in terms of access efficiency and time) for small sized files, this solution is proposed. A novel approach called VFS-HDFS architecture is designed in which the focus is on optimization of small sized files access problems with significant development compared with the existing solutions i.e. HDFS sequence files, HAR, NHAR. In the proposed work a Virtual file system layer has been added as a wrapper over the top of existing HDFS architecture. However, the research work is carried out without altering the existing HFDS architecture. In this paper drawbacks of existing techniques i.e. Flat File Technique and Table Chain Technique which are implemented in HDFS HAR, NHAR, sequence file is overcome by using Bucket Chain Technique. The files to merge in a single bucket are selected using ensemble classifier which is a combination of different classifiers. Combination of multiple classifiers gives the better accurate results. Using this proposed system, better results are obtained compared with the existing system in terms of access efficiency of small sized files in HDFS.
Author Mathur, Anjali
Alange, Neeta
Author_xml – sequence: 1
  givenname: Neeta
  surname: Alange
  fullname: Alange, Neeta
– sequence: 2
  givenname: Anjali
  surname: Mathur
  fullname: Mathur, Anjali
BookMark eNp9kLFOwzAURS1UJErpHzBYYk5xnNhJ2KLS0qJKHQKILXIcu3KVOMF2hlTi3wltGWDgLe8N594nnWsw0o0WANz6aOaHhCb36-d0nqUzjDCeIT9AFNMLMMY-oR4hERod79jzUfR-BabW7tEwQYJpHIzB57Z1qlYH5lSjYSNhVrOqgpk6iBIuVSVgyrmwFi6kVFwJzXuoNFyxsmla-KisM6ro3A-c9daJGhY9XGsndmao1Tv4pozrWPUL2bBemBtwKVllxfS8J-B1uXiZr7zN9mk9Tzcex4Q4j8mCcRqHWBYyYowJEhNKklDKQDJSBFgKLKiPI84jGZAilKUYjiIhEQv8kgQTcHfqbU3z0Qnr8n3TGT28zDFNCEUxDsOBCk8UN421Rsi8Napmps99lB9d5yfX-bfr_Ox6iD38iXHljj6dYar6P_wFkMSIUQ
CitedBy_id crossref_primary_10_1007_s10586_023_03992_1
ContentType Journal Article
Copyright 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
3V.
7XB
8FE
8FG
8FK
8G5
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
COVID
DWQXO
GNUQQ
GUQSH
HCIFZ
JQ2
K7-
M2O
MBDVC
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.14569/IJACSA.2022.0130626
DatabaseName CrossRef
ProQuest Central (Corporate)
ProQuest Central (purchase pre-March 2016)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
Research Library (Alumni)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection
ProQuest One
Coronavirus Research Database
ProQuest Central Korea
ProQuest Central Student
ProQuest Research Library
SciTech Premium Collection (via ProQuest)
ProQuest Computer Science Collection
Computer Science Database
Research Library
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Publicly Available Content Database
Research Library Prep
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
Research Library (Alumni Edition)
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Research Library
ProQuest Central (New)
Advanced Technologies & Aerospace Collection
ProQuest Central Basic
ProQuest One Academic Eastern Edition
Coronavirus Research Database
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2156-5570
ExternalDocumentID 10_14569_IJACSA_2022_0130626
GroupedDBID .DC
5VS
8G5
AAYXX
ABUWG
ADMLS
AFFHD
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
CITATION
DWQXO
EBS
EJD
GNUQQ
GUQSH
HCIFZ
K7-
KQ8
M2O
OK1
PHGZM
PHGZT
PIMPY
PQGLB
RNS
3V.
7XB
8FE
8FG
8FK
COVID
JQ2
MBDVC
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c255t-afbac6842fbf7aaae5856594ff3fa5b32fe2e6127cc7f35b4fde7f3b957a31d53
IEDL.DBID K7-
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000871782600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2158-107X
IngestDate Fri Jul 25 03:22:49 EDT 2025
Sat Nov 29 02:26:07 EST 2025
Tue Nov 18 22:27:27 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 6
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c255t-afbac6842fbf7aaae5856594ff3fa5b32fe2e6127cc7f35b4fde7f3b957a31d53
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2695608244?pq-origsite=%requestingapplication%
PQID 2695608244
PQPubID 5444811
ParticipantIDs proquest_journals_2695608244
crossref_primary_10_14569_IJACSA_2022_0130626
crossref_citationtrail_10_14569_IJACSA_2022_0130626
PublicationCentury 2000
PublicationDate 20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 20220101
  day: 01
PublicationDecade 2020
PublicationPlace West Yorkshire
PublicationPlace_xml – name: West Yorkshire
PublicationTitle International journal of advanced computer science & applications
PublicationYear 2022
Publisher Science and Information (SAI) Organization Limited
Publisher_xml – name: Science and Information (SAI) Organization Limited
SSID ssj0000392683
Score 2.1897914
Snippet Storage for large datasets, handling data in different formats and data getting generated with high speed are the major highlights of the Hadoop because of...
SourceID proquest
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
SubjectTerms Big Data
Chains
Classifiers
Efficiency
Optimization
Title Optimization of Small Sized File Access Efficiency in Hadoop Distributed File System by Integrating Virtual File System Layer
URI https://www.proquest.com/docview/2695608244
Volume 13
WOSCitedRecordID wos000871782600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 2156-5570
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000392683
  issn: 2158-107X
  databaseCode: P5Z
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 2156-5570
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000392683
  issn: 2158-107X
  databaseCode: K7-
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 2156-5570
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000392683
  issn: 2158-107X
  databaseCode: BENPR
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 2156-5570
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000392683
  issn: 2158-107X
  databaseCode: PIMPY
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Research Library
  customDbUrl:
  eissn: 2156-5570
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000392683
  issn: 2158-107X
  databaseCode: M2O
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/pqrl
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT-MwELZY2MNeFlhAy1M-cPXSOImTnFCBVpRHiSigwiWyHVuqVJrSFiSQ-O_MJA4LF_awFyuRnSjSZ88z8w0hu2GccBWLnIlAKRaEOmGJCuA2zk3gaSF0GYe8OYu63bjfT1IXcJu63yprmVgK6rzQGCPf4wIt-Ri00f74gWHXKMyuuhYa38iCx7mH-_w0Yu8xlgYof1EycYJiQxbTqO-q58BsSPY6J83DXhN8RM7_YAJPIMPCR-30WTiXGqe9-L_fukR-OluTNqvNsUzmzOgXWaz7OFB3rFfI6wXIjXtXkEkLS3v3cjikvcGLyWkbxAZtlm0Vaavkm8BiTToYURBaRTGmR0i9i12z6sUVCTpVz7TjuChAPdKbwQRLVT4tOZNg8K-S63br6vCYubYMTIP_MWPSKqkxfWeVjaSUBjwOESaBtb6VofK5NdyA4RRpHVk_VIHNDVyoJIyk7-Whv0bmR8XI_CbUcKtB5jSUkHFgtJVCKL8RGzQErRfl68Sv4ci04yzH1hnDDH0XBDGrQMwQxMyBuE7Y-1PjirPjH-u3ahgzd4Kn2V8MN76e3iQ_8GVVWGaLzM8mj2abfNdPs8F0skMWDlrd9HKn3JgwnvMLGNPwDmbSznl6-wb57ez-
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3Pb9MwFH6axiS4sI0f2mADH-Bo1jqJnRwmVG2rVloKUsfUW7AdW6rUNaUtmzaJf4m_kfcSZ7ALnHbglihODvGX771n530fwJskzYRJZcFlbAyPE5vxzMR4mhYublspbbUOeT5Qw2E6Hmef1-Bn0wtDv1U2nFgRdVFaWiM_EJIy-RSj0fv5N06uUbS72lho1LDou-srLNmWh71jnN-3QnRPzo5OeXAV4BbT5xXX3mhLu0_eeKW1dpgwyySLvY-8TkwkvBMO476yVvkoMbEvHB6YLFE6ahfkEoGU_yCOUkVa_X3Fb9d0WphsyEr5EwMpqaaqcejWwzQlO-h96ByNOliTCvGONgwlKTr8GQ3vBoMqwnU3_7d3swWPQy7NOjX4t2HNzZ7AZuNTwQJtPYUfn5AXL0LDKSs9G13o6ZSNJjeuYF2kRdapbCPZSaWnQc2obDJjSMplOWfHJC1MrmDN4FrknZlr1gtaGxj-2flkQa04d4YMNBY0z-DLvbyE57A-K2duB5gT3iKntozUaeys11KaqJU6SnR9WxW7EDXTn9ugyU7WINOcajMCTV6DJifQ5AE0u8Bv75rXmiT_GL_XwCYPDLXMf2Pmxd8vv4aHp2cfB_mgN-y_hEf04HoJag_WV4vvbh827OVqsly8qj4GBl_vG2G_AIRTSJA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimization+of+Small+Sized+File+Access+Efficiency+in+Hadoop+Distributed+File+System+by+Integrating+Virtual+File+System+Layer&rft.jtitle=International+journal+of+advanced+computer+science+%26+applications&rft.au=Alange%2C+Neeta&rft.au=Mathur%2C+Anjali&rft.date=2022-01-01&rft.pub=Science+and+Information+%28SAI%29+Organization+Limited&rft.issn=2158-107X&rft.eissn=2156-5570&rft.volume=13&rft.issue=6&rft_id=info:doi/10.14569%2FIJACSA.2022.0130626
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-107X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-107X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-107X&client=summon