Dealing with Small Files Problem in Hadoop Distributed File System

The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and Amazon use Hadoop mainly for unstructured data analysis as Hadoop framework works very well with structured and unstructured data. Hadoop distr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Procedia computer science Ročník 79; s. 1001 - 1012
Hlavní autoři: Bende, Sachin, Shedge, Rajashree
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 2016
Témata:
ISSN:1877-0509, 1877-0509
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and Amazon use Hadoop mainly for unstructured data analysis as Hadoop framework works very well with structured and unstructured data. Hadoop distributed file system (HDFS) is meant for storing large files but when large number of small files need to be stored, HDFS has to face few problems as all the files in HDFS are managed by a single server. Various methods have been proposed to deal with small files problem in HDFS. This paper gives comparative analysis of methods which deals with small files problem in HDFS.
AbstractList The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and Amazon use Hadoop mainly for unstructured data analysis as Hadoop framework works very well with structured and unstructured data. Hadoop distributed file system (HDFS) is meant for storing large files but when large number of small files need to be stored, HDFS has to face few problems as all the files in HDFS are managed by a single server. Various methods have been proposed to deal with small files problem in HDFS. This paper gives comparative analysis of methods which deals with small files problem in HDFS.
Author Shedge, Rajashree
Bende, Sachin
Author_xml – sequence: 1
  givenname: Sachin
  surname: Bende
  fullname: Bende, Sachin
  organization: Ramrao Adik Institute of Technology, Nerul, Navi Mumbai – 400 706, Maharashtra, India
– sequence: 2
  givenname: Rajashree
  surname: Shedge
  fullname: Shedge, Rajashree
  email: rajashree.shedge@rait.ac.in
  organization: Ramrao Adik Institute of Technology, Nerul, Navi Mumbai – 400 706, Maharashtra, India
BookMark eNqFkM1KAzEQgINUsNY-gZe8QNdkk22yBw_aWisUFKrnkGRnNWV3U5Ko9O3dth7Eg85hfmC-gfnO0aDzHSB0SUlGCZ1ebbJt8DZmeT9khGU0FydoSKUQE1KQcvCjP0PjGDekDyZlScUQ3c5BN657xZ8uveF1q5sGL1wDET8FbxposevwUlfeb_HcxRSceU9QHXbwehcTtBfotNZNhPF3HaGXxd3zbDlZPd4_zG5WE8u4TH2upTAMmKwqznMiBJhCaCZoocupZrKWjGtWTuuaG22KQltRcQo54bkGbtgIlce7NvgYA9TKuqST810K2jWKErX3oTbq4EPtfSjCVO-jZ9kvdhtcq8PuH-r6SEH_1oeDoKJ10FmoXACbVOXdn_wX2P588A
CitedBy_id crossref_primary_10_1089_big_2022_0181
crossref_primary_10_1145_3508395
crossref_primary_10_58496_ADSA_2024_004
crossref_primary_10_1016_j_procs_2018_05_128
crossref_primary_10_1016_j_cmpb_2019_105189
crossref_primary_10_1016_j_procs_2019_06_092
crossref_primary_10_1080_15472450_2019_1612247
crossref_primary_10_1002_int_22728
crossref_primary_10_1142_S0219649221500519
Cites_doi 10.4156/jdcta.vol6.issue20.32
ContentType Journal Article
Copyright 2016 The Authors
Copyright_xml – notice: 2016 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.procs.2016.03.127
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1877-0509
EndPage 1012
ExternalDocumentID 10_1016_j_procs_2016_03_127
S1877050916002581
GroupedDBID --K
0R~
0SF
1B1
457
5VS
6I.
71M
AACTN
AAEDT
AAEDW
AAFTH
AAIKJ
AALRI
AAQFI
AAXUO
ABMAC
ACGFS
ADBBV
ADEZE
AEXQZ
AFTJW
AGHFR
AITUG
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
E3Z
EBS
EJD
EP3
FDB
FNPLU
HZ~
IXB
KQ8
M41
M~E
NCXOZ
O-L
O9-
OK1
P2P
RIG
ROL
SES
SSZ
9DU
AAYWO
AAYXX
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEUPX
AFPUW
AIGII
AKBMS
AKRWK
AKYEP
CITATION
~HD
ID FETCH-LOGICAL-c348t-c3f87b3e38dd442077eb57a3715a96a38f834a396ff4bab55ac7d41e2042ae4b3
ISICitedReferencesCount 33
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000375222800124&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1877-0509
IngestDate Sat Nov 29 06:59:20 EST 2025
Tue Nov 18 21:25:18 EST 2025
Wed May 17 01:16:00 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords HDFS
Hadoop
Small files problem
MapReduce
Language English
License http://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c348t-c3f87b3e38dd442077eb57a3715a96a38f834a396ff4bab55ac7d41e2042ae4b3
OpenAccessLink https://dx.doi.org/10.1016/j.procs.2016.03.127
PageCount 12
ParticipantIDs crossref_citationtrail_10_1016_j_procs_2016_03_127
crossref_primary_10_1016_j_procs_2016_03_127
elsevier_sciencedirect_doi_10_1016_j_procs_2016_03_127
PublicationCentury 2000
PublicationDate 2016
2016-00-00
PublicationDateYYYYMMDD 2016-01-01
PublicationDate_xml – year: 2016
  text: 2016
PublicationDecade 2010
PublicationTitle Procedia computer science
PublicationYear 2016
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Dr.Raut, Phakade (bib0060) July 2014; 3
Palmer (bib0025) June 2012
Korat, Pamu (bib0040) June 2012; 1
Chen, Wang, Fu, Zhao (bib0045) Nov 2012; 6
Maniyam, S., Kerzner, M. Hadoop illuminated} (1st Ed.). Houston: Hadoop Illuminated LLC., 2013.
Beal, V. HDFS. Retrieved Nov 27, 2014, available at http://www.webopedia.com/TERM/H/hadoop_distributed_file_system_hdfs.html.
Bappalige, S.P. An introduction to Apache Hadoop for big data. Retrieved Nov 27, 2014, available at http://opensource.com/life/14/8/intro-apache-hadoop-big-data.
Panchal, Gohil (bib0020) August 2014; 5
Dave, P. Learning Basics of Big Data in 21 Days. Retrieved Nov 26, 2014, available at http://blog.sqlauthority.com/2013/10/30/big-data learning-basics-of-big-data-in-21-days-\\bookmark/.
Nupairoj, Vorapongkitipun, Chatuporn (bib0055) May 2014
Chansler, R., Kuang, H., Radia, S., and Shvachko, K., The Architecture of Open Source Applications} (1st ed.). Brea: aosabook, (2013).
Gurav, Jayakar (bib0050) June 2014; 3
10.1016/j.procs.2016.03.127_bib0035
10.1016/j.procs.2016.03.127_bib0010
Dr.Raut (10.1016/j.procs.2016.03.127_bib0060) 2014; 3
10.1016/j.procs.2016.03.127_bib0005
10.1016/j.procs.2016.03.127_bib0015
Palmer (10.1016/j.procs.2016.03.127_bib0025) 2012
Korat (10.1016/j.procs.2016.03.127_bib0040) 2012; 1
Gurav (10.1016/j.procs.2016.03.127_bib0050) 2014; 3
Nupairoj (10.1016/j.procs.2016.03.127_bib0055) 2014
Panchal (10.1016/j.procs.2016.03.127_bib0020) 2014; 5
Chen (10.1016/j.procs.2016.03.127_bib0045) 2012; 6
10.1016/j.procs.2016.03.127_bib0030
References_xml – volume: 1
  start-page: 635
  year: June 2012
  end-page: 642
  ident: bib0040
  article-title: Reduction of Data at Namenode in HDFS using HAR Technique
  publication-title: International Journal of Advanced Research in Computer Engineering and Technology
– reference: Bappalige, S.P. An introduction to Apache Hadoop for big data. Retrieved Nov 27, 2014, available at http://opensource.com/life/14/8/intro-apache-hadoop-big-data.
– start-page: 200
  year: May 2014
  end-page: 205
  ident: bib0055
  article-title: Improving Performance of Small-File Accessing in Hadoop
  publication-title: 11th International Joint Conference on Computer Science and Software Engineering
– reference: Chansler, R., Kuang, H., Radia, S., and Shvachko, K., The Architecture of Open Source Applications} (1st ed.). Brea: aosabook, (2013).
– volume: 6
  start-page: 296
  year: Nov 2012
  end-page: 304
  ident: bib0045
  article-title: An Improved Small File Processing Method for HDFS
  publication-title: International Journal of Digital Content Technology and its Applications (JDCTA)
– year: June 2012
  ident: bib0025
  article-title: Hadoop: Strengths and Limitations in National Security Missions
  publication-title: SAP National Security Services
– reference: Beal, V. HDFS. Retrieved Nov 27, 2014, available at http://www.webopedia.com/TERM/H/hadoop_distributed_file_system_hdfs.html.
– volume: 3
  start-page: 278
  year: July 2014
  end-page: 280
  ident: bib0060
  article-title: An Innovative Strategy for Improved Processing of Small Files in Hadoop
  publication-title: International Journal of Application or Innovation in Engineering and Management
– reference: Dave, P. Learning Basics of Big Data in 21 Days. Retrieved Nov 26, 2014, available at http://blog.sqlauthority.com/2013/10/30/big-data learning-basics-of-big-data-in-21-days-\\bookmark/.
– volume: 5
  start-page: 45
  year: August 2014
  end-page: 49
  ident: bib0020
  article-title: Efficient Ways to Improve the Performance of HDFS for Small Files
  publication-title: Computer Engineering and Intelligent Systems
– volume: 3
  start-page: 785
  year: June 2014
  end-page: 789
  ident: bib0050
  article-title: Efficient Way for Handling Small Files using Extended HDFS
  publication-title: International Journal of Computer Science and Mobile Computing
– reference: Maniyam, S., Kerzner, M. Hadoop illuminated} (1st Ed.). Houston: Hadoop Illuminated LLC., 2013.
– ident: 10.1016/j.procs.2016.03.127_bib0010
– volume: 6
  start-page: 296
  year: 2012
  ident: 10.1016/j.procs.2016.03.127_bib0045
  article-title: An Improved Small File Processing Method for HDFS
  publication-title: International Journal of Digital Content Technology and its Applications (JDCTA)
  doi: 10.4156/jdcta.vol6.issue20.32
– ident: 10.1016/j.procs.2016.03.127_bib0035
– volume: 3
  start-page: 785
  year: 2014
  ident: 10.1016/j.procs.2016.03.127_bib0050
  article-title: Efficient Way for Handling Small Files using Extended HDFS
  publication-title: International Journal of Computer Science and Mobile Computing
– volume: 1
  start-page: 635
  year: 2012
  ident: 10.1016/j.procs.2016.03.127_bib0040
  article-title: Reduction of Data at Namenode in HDFS using HAR Technique
  publication-title: International Journal of Advanced Research in Computer Engineering and Technology
– ident: 10.1016/j.procs.2016.03.127_bib0005
– ident: 10.1016/j.procs.2016.03.127_bib0030
– ident: 10.1016/j.procs.2016.03.127_bib0015
– year: 2012
  ident: 10.1016/j.procs.2016.03.127_bib0025
  article-title: Hadoop: Strengths and Limitations in National Security Missions
  publication-title: SAP National Security Services
– start-page: 200
  year: 2014
  ident: 10.1016/j.procs.2016.03.127_bib0055
  article-title: Improving Performance of Small-File Accessing in Hadoop
  publication-title: 11th International Joint Conference on Computer Science and Software Engineering
– volume: 3
  start-page: 278
  year: 2014
  ident: 10.1016/j.procs.2016.03.127_bib0060
  article-title: An Innovative Strategy for Improved Processing of Small Files in Hadoop
  publication-title: International Journal of Application or Innovation in Engineering and Management
– volume: 5
  start-page: 45
  year: 2014
  ident: 10.1016/j.procs.2016.03.127_bib0020
  article-title: Efficient Ways to Improve the Performance of HDFS for Small Files
  publication-title: Computer Engineering and Intelligent Systems
SSID ssj0000388917
Score 2.2525365
Snippet The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 1001
SubjectTerms Hadoop
HDFS
MapReduce
Small files problem
Title Dealing with Small Files Problem in Hadoop Distributed File System
URI https://dx.doi.org/10.1016/j.procs.2016.03.127
Volume 79
WOSCitedRecordID wos000375222800124&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: M~E
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Na9swFBcl62GX7rO06zZ02G0zRNaXfew-wi4Npe2gNyNbMkvYnFBnI6f-7X1Pkl2zhLIWehGJkKKgn_z09Kz3-xHyQVqmasFsYithE2G1SkoLpxTYXUrwZ5VgrPZiE3o6zS4v89Oosdl6OQHdNNl6nS8fFWqoA7AxdfYecPc_ChXwGUCHEmCH8r-A_wquXx9hPf-Nr54n8Oi3mBOA2jEY4QB7s1gskXozCF6B14ltIn_50GH1iQSwhvzdc5R_-Bj3zP4cjzH0EFvGa5l9xOani6rtZ2ZuWlgxbhhgCJmPId61kfPiTWSmdYKsMWEH2VIX7WoQiYmGEameBpsssoptNeAhljDH7aNCNnWmkIOWBf6Af5ixz3FYHJXhy0WJCfhPUi1zVPA4ub4NtSHhTe61l_v_2fFP-Zt-G2Nt91EGfsfFc7IXDwz0OAD9guy45iV51olx0GibX5HPEXeKuFOPO_W404g7nTU04E4HuPs2NOD-mvyYfLv48j2JAhlJxUW2grLOdMkdz6wVIh1r7UqpDddMmlwZntUZF4bnqq5FaUopTaWtYC4FS22cKPk-GTWLxh0QamQqjOHj2kFbq2QOD6mtFLNS1NqU6SFJuykpqsgejyImv4rumuC88PNY4DwWY17APB6ST32nZSBPubu56ua6iGs5-HUFrI67Or55aMcj8hS_hZDaWzJaXf1x78hu9Xc1a6_e-1V0A2gQfwY
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dealing+with+Small+Files+Problem+in+Hadoop+Distributed+File+System&rft.jtitle=Procedia+computer+science&rft.au=Bende%2C+Sachin&rft.au=Shedge%2C+Rajashree&rft.date=2016&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=79&rft.spage=1001&rft.epage=1012&rft_id=info:doi/10.1016%2Fj.procs.2016.03.127&rft.externalDocID=S1877050916002581
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon