Distributed Computing Engines for Big Data Analytics

Technologies like cloud computing paved way for dealing with massive amounts of data. Prior to cloud, it was not possible unless you invest large amounts for computing resources. Now there is ecosystem which is conducive to storing and processing voluminous data that cannot be handled by local compu...

Full description

Saved in:
Bibliographic Details
Published in:International journal of recent technology and engineering Vol. 8; no. 2; pp. 5841 - 5845
Main Authors: Prashanthi, Bh, Sowjanya, G., Madhuri, D. Krishna
Format: Journal Article
Language:English
Published: 30.07.2019
ISSN:2277-3878, 2277-3878
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Technologies like cloud computing paved way for dealing with massive amounts of data. Prior to cloud, it was not possible unless you invest large amounts for computing resources. Now there is ecosystem which is conducive to storing and processing voluminous data that cannot be handled by local computing resources. With such ecosystem, big data technology came into existence. Big data is the data characterized by volume, velocity, veracity and variety. This has enabled enterprises to give more value to every piece of data. This in turn led to the increased usage of cloud for both storage and processing. For processing big data efficient technologies are required. New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used. However, there are other emerging frameworks like Apache Spark and Apache Flink to handle big data more efficiently. In this paper, empirical study is made on the three frameworks like Hadoop, Apache Spark and Apache Flink with different parameters like type of network, block size of HDFS, input data size and other configuration changes. The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop. This is evaluated with different benchmark big data workloads.
AbstractList Technologies like cloud computing paved way for dealing with massive amounts of data. Prior to cloud, it was not possible unless you invest large amounts for computing resources. Now there is ecosystem which is conducive to storing and processing voluminous data that cannot be handled by local computing resources. With such ecosystem, big data technology came into existence. Big data is the data characterized by volume, velocity, veracity and variety. This has enabled enterprises to give more value to every piece of data. This in turn led to the increased usage of cloud for both storage and processing. For processing big data efficient technologies are required. New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used. However, there are other emerging frameworks like Apache Spark and Apache Flink to handle big data more efficiently. In this paper, empirical study is made on the three frameworks like Hadoop, Apache Spark and Apache Flink with different parameters like type of network, block size of HDFS, input data size and other configuration changes. The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop. This is evaluated with different benchmark big data workloads.
Author Prashanthi, Bh
Madhuri, D. Krishna
Sowjanya, G.
Author_xml – sequence: 1
  givenname: Bh
  surname: Prashanthi
  fullname: Prashanthi, Bh
– sequence: 2
  givenname: G.
  surname: Sowjanya
  fullname: Sowjanya, G.
– sequence: 3
  givenname: D. Krishna
  surname: Madhuri
  fullname: Madhuri, D. Krishna
BookMark eNpNz71uwjAYhWGrolIp5Qq6-AaS-v-LRwiUVkLq0s6WHdvICBJkh4G7rxQ6dDrvdKTnGc36oQ8IvVJSc6kFeUvHPIZ6zQFoTaBhVD-gOWMAFW-gmf3rJ7Qs5UgIoVxRwdUciU0qY07uOgaP2-F8uY6pP-Btf0h9KDgOGa_TAW_saPGqt6fbmLrygh6jPZWw_NsF-nnffrcf1f5r99mu9lVHpdaV0pQGFSM4EJIFRhz3qhPgAWRQXnrtVJBR6sgEdwDKMeEa5mSnPWPM8gXi998uD6XkEM0lp7PNN0OJmexmspvJbu52_guPEk7m
ContentType Journal Article
CorporateAuthor Griet, Assistant Professor,Dept of CSE, GRIET
CorporateAuthor_xml – name: Griet, Assistant Professor,Dept of CSE, GRIET
DBID AAYXX
CITATION
DOI 10.35940/ijrte.B3771.078219
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2277-3878
EndPage 5845
ExternalDocumentID 10_35940_ijrte_B3771_078219
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
OK1
RNS
ID FETCH-LOGICAL-c1599-6911e6ff7b7452e20b3d6c47d775e6d5d9b6e5f59f243b776b24b82b5c9d222a3
ISSN 2277-3878
IngestDate Sat Nov 29 06:08:51 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 2
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c1599-6911e6ff7b7452e20b3d6c47d775e6d5d9b6e5f59f243b776b24b82b5c9d222a3
OpenAccessLink https://doi.org/10.35940/ijrte.b3771.078219
PageCount 5
ParticipantIDs crossref_primary_10_35940_ijrte_B3771_078219
PublicationCentury 2000
PublicationDate 2019-07-30
PublicationDateYYYYMMDD 2019-07-30
PublicationDate_xml – month: 07
  year: 2019
  text: 2019-07-30
  day: 30
PublicationDecade 2010
PublicationTitle International journal of recent technology and engineering
PublicationYear 2019
SSID ssj0001361436
Score 2.0740423
Snippet Technologies like cloud computing paved way for dealing with massive amounts of data. Prior to cloud, it was not possible unless you invest large amounts for...
SourceID crossref
SourceType Index Database
StartPage 5841
Title Distributed Computing Engines for Big Data Analytics
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2277-3878
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001361436
  issn: 2277-3878
  databaseCode: M~E
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07T9xAEF5dIAUpEBBQAgG5oCM2eJ92mUt4NCAkQKKzvA9zd0IGHYZQ8RP4zcyuXwtBCIo01t3qNDp7Pn07np35BqFNoDycY8Hs-HYcUi51KCFKCLnGMdEkVljlbtiEODpKzs_T48Hgse2FubsUZZnc36fX_9XVsAbOtq2zH3B3ZxQW4DM4Ha7gdri-y_F_rBSunWJlc7duZoPNBtS6g058YWs4vgBvV3mtSFK19e6Tvqi9zxF6yhLAjbZwoOqS8e7gwfSChh3NTvObEXjMzQveGnYJ55OrvxPgHheu7kd9LlyPbpt-98iRzqie6N0mI1z_U3uu4jgL2xNhktRTeSLzylpDuomHLewRKMRDsbcZw1f2GtETllJbGjmeTCsTDYkQcWSDnYZ-n8lqv9juuiJEeP1xZjJnJHNGstrIJzQL-E1tieDhg5ezIxDMuKmT3T3VQlbOzva_f8YLdryo5XQBzTevG8GvGiaLaGDKJfTFE6H8iqgHmKADTNAAJgDABACYwAIm6ACzjM72dk9_H4TNLI1QQcCahhw2NcOLQkhBGTZ4RxLNFRVaCGa4ZjqV3LCCpQWmRArBJaYywZKpVEMImZMVNFNeleYbCigpmNaCK6lsV3KeK5FwWuBY7eQQkMvv6Gd709l1LZmSvfGsVz_28zU01wPvB5qpprdmHX1Wd9X4Zrrh_PUEvMdkMg
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+Computing+Engines+for+Big+Data+Analytics&rft.jtitle=International+journal+of+recent+technology+and+engineering&rft.au=Prashanthi%2C+Bh&rft.au=Sowjanya%2C+G.&rft.au=Madhuri%2C+D.+Krishna&rft.date=2019-07-30&rft.issn=2277-3878&rft.eissn=2277-3878&rft.volume=8&rft.issue=2&rft.spage=5841&rft.epage=5845&rft_id=info:doi/10.35940%2Fijrte.B3771.078219&rft.externalDBID=n%2Fa&rft.externalDocID=10_35940_ijrte_B3771_078219
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2277-3878&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2277-3878&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2277-3878&client=summon