Dealing with Small Files Problem in Hadoop Distributed File System
The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and Amazon use Hadoop mainly for unstructured data analysis as Hadoop framework works very well with structured and unstructured data. Hadoop distr...
Uloženo v:
| Vydáno v: | Procedia computer science Ročník 79; s. 1001 - 1012 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
2016
|
| Témata: | |
| ISSN: | 1877-0509, 1877-0509 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and Amazon use Hadoop mainly for unstructured data analysis as Hadoop framework works very well with structured and unstructured data. Hadoop distributed file system (HDFS) is meant for storing large files but when large number of small files need to be stored, HDFS has to face few problems as all the files in HDFS are managed by a single server. Various methods have been proposed to deal with small files problem in HDFS. This paper gives comparative analysis of methods which deals with small files problem in HDFS. |
|---|---|
| AbstractList | The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and Amazon use Hadoop mainly for unstructured data analysis as Hadoop framework works very well with structured and unstructured data. Hadoop distributed file system (HDFS) is meant for storing large files but when large number of small files need to be stored, HDFS has to face few problems as all the files in HDFS are managed by a single server. Various methods have been proposed to deal with small files problem in HDFS. This paper gives comparative analysis of methods which deals with small files problem in HDFS. |
| Author | Shedge, Rajashree Bende, Sachin |
| Author_xml | – sequence: 1 givenname: Sachin surname: Bende fullname: Bende, Sachin organization: Ramrao Adik Institute of Technology, Nerul, Navi Mumbai – 400 706, Maharashtra, India – sequence: 2 givenname: Rajashree surname: Shedge fullname: Shedge, Rajashree email: rajashree.shedge@rait.ac.in organization: Ramrao Adik Institute of Technology, Nerul, Navi Mumbai – 400 706, Maharashtra, India |
| BookMark | eNqFkM1KAzEQgINUsNY-gZe8QNdkk22yBw_aWisUFKrnkGRnNWV3U5Ko9O3dth7Eg85hfmC-gfnO0aDzHSB0SUlGCZ1ebbJt8DZmeT9khGU0FydoSKUQE1KQcvCjP0PjGDekDyZlScUQ3c5BN657xZ8uveF1q5sGL1wDET8FbxposevwUlfeb_HcxRSceU9QHXbwehcTtBfotNZNhPF3HaGXxd3zbDlZPd4_zG5WE8u4TH2upTAMmKwqznMiBJhCaCZoocupZrKWjGtWTuuaG22KQltRcQo54bkGbtgIlce7NvgYA9TKuqST810K2jWKErX3oTbq4EPtfSjCVO-jZ9kvdhtcq8PuH-r6SEH_1oeDoKJ10FmoXACbVOXdn_wX2P588A |
| CitedBy_id | crossref_primary_10_1089_big_2022_0181 crossref_primary_10_1145_3508395 crossref_primary_10_58496_ADSA_2024_004 crossref_primary_10_1016_j_procs_2018_05_128 crossref_primary_10_1016_j_cmpb_2019_105189 crossref_primary_10_1016_j_procs_2019_06_092 crossref_primary_10_1080_15472450_2019_1612247 crossref_primary_10_1002_int_22728 crossref_primary_10_1142_S0219649221500519 |
| Cites_doi | 10.4156/jdcta.vol6.issue20.32 |
| ContentType | Journal Article |
| Copyright | 2016 The Authors |
| Copyright_xml | – notice: 2016 The Authors |
| DBID | 6I. AAFTH AAYXX CITATION |
| DOI | 10.1016/j.procs.2016.03.127 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1877-0509 |
| EndPage | 1012 |
| ExternalDocumentID | 10_1016_j_procs_2016_03_127 S1877050916002581 |
| GroupedDBID | --K 0R~ 0SF 1B1 457 5VS 6I. 71M AACTN AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO ABMAC ACGFS ADBBV ADEZE AEXQZ AFTJW AGHFR AITUG ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E NCXOZ O-L O9- OK1 P2P RIG ROL SES SSZ 9DU AAYWO AAYXX ABWVN ACRPL ACVFH ADCNI ADNMO ADVLN AEUPX AFPUW AIGII AKBMS AKRWK AKYEP CITATION ~HD |
| ID | FETCH-LOGICAL-c348t-c3f87b3e38dd442077eb57a3715a96a38f834a396ff4bab55ac7d41e2042ae4b3 |
| ISICitedReferencesCount | 33 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000375222800124&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1877-0509 |
| IngestDate | Sat Nov 29 06:59:20 EST 2025 Tue Nov 18 21:25:18 EST 2025 Wed May 17 01:16:00 EDT 2023 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | HDFS Hadoop Small files problem MapReduce |
| Language | English |
| License | http://creativecommons.org/licenses/by-nc-nd/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c348t-c3f87b3e38dd442077eb57a3715a96a38f834a396ff4bab55ac7d41e2042ae4b3 |
| OpenAccessLink | https://dx.doi.org/10.1016/j.procs.2016.03.127 |
| PageCount | 12 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_procs_2016_03_127 crossref_primary_10_1016_j_procs_2016_03_127 elsevier_sciencedirect_doi_10_1016_j_procs_2016_03_127 |
| PublicationCentury | 2000 |
| PublicationDate | 2016 2016-00-00 |
| PublicationDateYYYYMMDD | 2016-01-01 |
| PublicationDate_xml | – year: 2016 text: 2016 |
| PublicationDecade | 2010 |
| PublicationTitle | Procedia computer science |
| PublicationYear | 2016 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Dr.Raut, Phakade (bib0060) July 2014; 3 Palmer (bib0025) June 2012 Korat, Pamu (bib0040) June 2012; 1 Chen, Wang, Fu, Zhao (bib0045) Nov 2012; 6 Maniyam, S., Kerzner, M. Hadoop illuminated} (1st Ed.). Houston: Hadoop Illuminated LLC., 2013. Beal, V. HDFS. Retrieved Nov 27, 2014, available at http://www.webopedia.com/TERM/H/hadoop_distributed_file_system_hdfs.html. Bappalige, S.P. An introduction to Apache Hadoop for big data. Retrieved Nov 27, 2014, available at http://opensource.com/life/14/8/intro-apache-hadoop-big-data. Panchal, Gohil (bib0020) August 2014; 5 Dave, P. Learning Basics of Big Data in 21 Days. Retrieved Nov 26, 2014, available at http://blog.sqlauthority.com/2013/10/30/big-data learning-basics-of-big-data-in-21-days-\\bookmark/. Nupairoj, Vorapongkitipun, Chatuporn (bib0055) May 2014 Chansler, R., Kuang, H., Radia, S., and Shvachko, K., The Architecture of Open Source Applications} (1st ed.). Brea: aosabook, (2013). Gurav, Jayakar (bib0050) June 2014; 3 10.1016/j.procs.2016.03.127_bib0035 10.1016/j.procs.2016.03.127_bib0010 Dr.Raut (10.1016/j.procs.2016.03.127_bib0060) 2014; 3 10.1016/j.procs.2016.03.127_bib0005 10.1016/j.procs.2016.03.127_bib0015 Palmer (10.1016/j.procs.2016.03.127_bib0025) 2012 Korat (10.1016/j.procs.2016.03.127_bib0040) 2012; 1 Gurav (10.1016/j.procs.2016.03.127_bib0050) 2014; 3 Nupairoj (10.1016/j.procs.2016.03.127_bib0055) 2014 Panchal (10.1016/j.procs.2016.03.127_bib0020) 2014; 5 Chen (10.1016/j.procs.2016.03.127_bib0045) 2012; 6 10.1016/j.procs.2016.03.127_bib0030 |
| References_xml | – volume: 1 start-page: 635 year: June 2012 end-page: 642 ident: bib0040 article-title: Reduction of Data at Namenode in HDFS using HAR Technique publication-title: International Journal of Advanced Research in Computer Engineering and Technology – reference: Bappalige, S.P. An introduction to Apache Hadoop for big data. Retrieved Nov 27, 2014, available at http://opensource.com/life/14/8/intro-apache-hadoop-big-data. – start-page: 200 year: May 2014 end-page: 205 ident: bib0055 article-title: Improving Performance of Small-File Accessing in Hadoop publication-title: 11th International Joint Conference on Computer Science and Software Engineering – reference: Chansler, R., Kuang, H., Radia, S., and Shvachko, K., The Architecture of Open Source Applications} (1st ed.). Brea: aosabook, (2013). – volume: 6 start-page: 296 year: Nov 2012 end-page: 304 ident: bib0045 article-title: An Improved Small File Processing Method for HDFS publication-title: International Journal of Digital Content Technology and its Applications (JDCTA) – year: June 2012 ident: bib0025 article-title: Hadoop: Strengths and Limitations in National Security Missions publication-title: SAP National Security Services – reference: Beal, V. HDFS. Retrieved Nov 27, 2014, available at http://www.webopedia.com/TERM/H/hadoop_distributed_file_system_hdfs.html. – volume: 3 start-page: 278 year: July 2014 end-page: 280 ident: bib0060 article-title: An Innovative Strategy for Improved Processing of Small Files in Hadoop publication-title: International Journal of Application or Innovation in Engineering and Management – reference: Dave, P. Learning Basics of Big Data in 21 Days. Retrieved Nov 26, 2014, available at http://blog.sqlauthority.com/2013/10/30/big-data learning-basics-of-big-data-in-21-days-\\bookmark/. – volume: 5 start-page: 45 year: August 2014 end-page: 49 ident: bib0020 article-title: Efficient Ways to Improve the Performance of HDFS for Small Files publication-title: Computer Engineering and Intelligent Systems – volume: 3 start-page: 785 year: June 2014 end-page: 789 ident: bib0050 article-title: Efficient Way for Handling Small Files using Extended HDFS publication-title: International Journal of Computer Science and Mobile Computing – reference: Maniyam, S., Kerzner, M. Hadoop illuminated} (1st Ed.). Houston: Hadoop Illuminated LLC., 2013. – ident: 10.1016/j.procs.2016.03.127_bib0010 – volume: 6 start-page: 296 year: 2012 ident: 10.1016/j.procs.2016.03.127_bib0045 article-title: An Improved Small File Processing Method for HDFS publication-title: International Journal of Digital Content Technology and its Applications (JDCTA) doi: 10.4156/jdcta.vol6.issue20.32 – ident: 10.1016/j.procs.2016.03.127_bib0035 – volume: 3 start-page: 785 year: 2014 ident: 10.1016/j.procs.2016.03.127_bib0050 article-title: Efficient Way for Handling Small Files using Extended HDFS publication-title: International Journal of Computer Science and Mobile Computing – volume: 1 start-page: 635 year: 2012 ident: 10.1016/j.procs.2016.03.127_bib0040 article-title: Reduction of Data at Namenode in HDFS using HAR Technique publication-title: International Journal of Advanced Research in Computer Engineering and Technology – ident: 10.1016/j.procs.2016.03.127_bib0005 – ident: 10.1016/j.procs.2016.03.127_bib0030 – ident: 10.1016/j.procs.2016.03.127_bib0015 – year: 2012 ident: 10.1016/j.procs.2016.03.127_bib0025 article-title: Hadoop: Strengths and Limitations in National Security Missions publication-title: SAP National Security Services – start-page: 200 year: 2014 ident: 10.1016/j.procs.2016.03.127_bib0055 article-title: Improving Performance of Small-File Accessing in Hadoop publication-title: 11th International Joint Conference on Computer Science and Software Engineering – volume: 3 start-page: 278 year: 2014 ident: 10.1016/j.procs.2016.03.127_bib0060 article-title: An Innovative Strategy for Improved Processing of Small Files in Hadoop publication-title: International Journal of Application or Innovation in Engineering and Management – volume: 5 start-page: 45 year: 2014 ident: 10.1016/j.procs.2016.03.127_bib0020 article-title: Efficient Ways to Improve the Performance of HDFS for Small Files publication-title: Computer Engineering and Intelligent Systems |
| SSID | ssj0000388917 |
| Score | 2.2525365 |
| Snippet | The usage of Hadoop has been increasing greatly in recent years. Hadoop adoption is widespread. Some notable big users such as Yahoo, Facebook, Netflix, and... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 1001 |
| SubjectTerms | Hadoop HDFS MapReduce Small files problem |
| Title | Dealing with Small Files Problem in Hadoop Distributed File System |
| URI | https://dx.doi.org/10.1016/j.procs.2016.03.127 |
| Volume | 79 |
| WOSCitedRecordID | wos000375222800124&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Na9swFBcl62GX7rO06zZ02G0zRNaXfew-wi4Npe2gNyNbMkvYnFBnI6f-7X1Pkl2zhLIWehGJkKKgn_z09Kz3-xHyQVqmasFsYithE2G1SkoLpxTYXUrwZ5VgrPZiE3o6zS4v89Oosdl6OQHdNNl6nS8fFWqoA7AxdfYecPc_ChXwGUCHEmCH8r-A_wquXx9hPf-Nr54n8Oi3mBOA2jEY4QB7s1gskXozCF6B14ltIn_50GH1iQSwhvzdc5R_-Bj3zP4cjzH0EFvGa5l9xOani6rtZ2ZuWlgxbhhgCJmPId61kfPiTWSmdYKsMWEH2VIX7WoQiYmGEameBpsssoptNeAhljDH7aNCNnWmkIOWBf6Af5ixz3FYHJXhy0WJCfhPUi1zVPA4ub4NtSHhTe61l_v_2fFP-Zt-G2Nt91EGfsfFc7IXDwz0OAD9guy45iV51olx0GibX5HPEXeKuFOPO_W404g7nTU04E4HuPs2NOD-mvyYfLv48j2JAhlJxUW2grLOdMkdz6wVIh1r7UqpDddMmlwZntUZF4bnqq5FaUopTaWtYC4FS22cKPk-GTWLxh0QamQqjOHj2kFbq2QOD6mtFLNS1NqU6SFJuykpqsgejyImv4rumuC88PNY4DwWY17APB6ST32nZSBPubu56ua6iGs5-HUFrI67Or55aMcj8hS_hZDaWzJaXf1x78hu9Xc1a6_e-1V0A2gQfwY |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dealing+with+Small+Files+Problem+in+Hadoop+Distributed+File+System&rft.jtitle=Procedia+computer+science&rft.au=Bende%2C+Sachin&rft.au=Shedge%2C+Rajashree&rft.date=2016&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=79&rft.spage=1001&rft.epage=1012&rft_id=info:doi/10.1016%2Fj.procs.2016.03.127&rft.externalDocID=S1877050916002581 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon |