Investigating the performance of Hadoop and Spark platforms on machine learning algorithms

One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing Vol. 77; no. 2; pp. 1273 - 1300
Main Authors: Mostafaeipour, Ali, Jahangard Rafsanjani, Amir, Ahmadi, Mohammad, Arockia Dhanraj, Joshuva
Format: Journal Article
Language:English
Published: New York Springer US 01.02.2021
Springer Nature B.V
Subjects:
ISSN:0920-8542, 1573-0484
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the in-memory programming model, Spark as an open-source framework is suitable for processing iterative algorithms. In this paper, Hadoop and Spark frameworks, the big data processing platforms, are evaluated and compared in terms of runtime, memory and network usage, and central processor efficiency. Hence, the K-nearest neighbor (KNN) algorithm is implemented on datasets with different sizes within both Hadoop and Spark frameworks. The results show that the runtime of the KNN algorithm implemented on Spark is 4 to 4.5 times faster than Hadoop. Evaluations show that Hadoop uses more sources, including central processor and network. It is concluded that the CPU in Spark is more effective than Hadoop. On the other hand, the memory usage in Hadoop is less than Spark.
AbstractList One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the in-memory programming model, Spark as an open-source framework is suitable for processing iterative algorithms. In this paper, Hadoop and Spark frameworks, the big data processing platforms, are evaluated and compared in terms of runtime, memory and network usage, and central processor efficiency. Hence, the K-nearest neighbor (KNN) algorithm is implemented on datasets with different sizes within both Hadoop and Spark frameworks. The results show that the runtime of the KNN algorithm implemented on Spark is 4 to 4.5 times faster than Hadoop. Evaluations show that Hadoop uses more sources, including central processor and network. It is concluded that the CPU in Spark is more effective than Hadoop. On the other hand, the memory usage in Hadoop is less than Spark.
Author Ahmadi, Mohammad
Jahangard Rafsanjani, Amir
Mostafaeipour, Ali
Arockia Dhanraj, Joshuva
Author_xml – sequence: 1
  givenname: Ali
  surname: Mostafaeipour
  fullname: Mostafaeipour, Ali
  organization: Industrial Engineering Department, Yazd University
– sequence: 2
  givenname: Amir
  surname: Jahangard Rafsanjani
  fullname: Jahangard Rafsanjani, Amir
  organization: Computer Engineering Department, Yazd University
– sequence: 3
  givenname: Mohammad
  surname: Ahmadi
  fullname: Ahmadi, Mohammad
  organization: Computer Engineering Department, Yazd University
– sequence: 4
  givenname: Joshuva
  orcidid: 0000-0001-5048-7775
  surname: Arockia Dhanraj
  fullname: Arockia Dhanraj, Joshuva
  email: joshuva1991@gmail.com
  organization: Centre for Automation and Robotics (ANRO), Department of Mechanical Engineering, Hindustan Institute of Technology and Science
BookMark eNp9kD9PwzAQxS1UJNrCF2CyxBzwvyTOiCqgSJUY6MRiXRw7TUnsYKdIfHtSgoTE0OmGe793994CzZx3BqFrSm4pIfldpJSxPCGMJIRzJpP0DM1pmvOECClmaE6KcSVTwS7QIsY9IUTwnM_R27P7NHFoahgaV-NhZ3BvgvWhA6cN9havofK-x-Aq_NpDeMd9C8NRELF3uAO9a5zBrYHgjg7Q1j40w66Ll-jcQhvN1e9cou3jw3a1TjYvT8-r-02iuUyHpCrLQtuKlpyWoLOKZzoXRpjSpJVgLKs0VNaWmWFMC2MsMFEABSA0K6S2fIluJts--I_DmEXt_SG48aJiQpJcZnmRjyo5qXTwMQZjlW6GMbN3Q4CmVZSoY5FqKlKNRaqfIlU6ouwf2oemg_B1GuITFEexq034--oE9Q3YMope
CitedBy_id crossref_primary_10_1007_s11227_021_04072_0
crossref_primary_10_1016_j_measurement_2022_111174
crossref_primary_10_1109_ACCESS_2023_3307512
crossref_primary_10_1155_2022_7660071
crossref_primary_10_3390_bdcc5020021
crossref_primary_10_3390_bdcc6020038
crossref_primary_10_1007_s11227_022_04381_y
crossref_primary_10_3390_risks11080145
crossref_primary_10_3390_app14010452
crossref_primary_10_1007_s11042_023_17330_5
crossref_primary_10_1007_s41060_025_00753_8
crossref_primary_10_1063_5_0191442
crossref_primary_10_3390_info14020093
crossref_primary_10_1007_s11042_023_14562_3
crossref_primary_10_2478_ttj_2020_0023
crossref_primary_10_1155_2022_7861756
crossref_primary_10_3724_SP_J_1249_2025_03317
crossref_primary_10_1155_2022_3443182
crossref_primary_10_1109_ACCESS_2023_3262989
crossref_primary_10_1186_s40537_022_00623_1
crossref_primary_10_2478_amns_2024_1956
crossref_primary_10_3390_en16114446
crossref_primary_10_1155_2022_9095330
crossref_primary_10_1007_s11227_023_05443_5
crossref_primary_10_1093_comjnl_bxad017
crossref_primary_10_1108_DTA_06_2021_0153
crossref_primary_10_3390_info15040178
crossref_primary_10_1016_j_sasc_2024_200096
crossref_primary_10_1007_s11042_023_17932_z
crossref_primary_10_3233_IDA_226774
crossref_primary_10_3390_ijgi10110763
crossref_primary_10_1108_IJICC_10_2020_0157
crossref_primary_10_1109_ACCESS_2022_3226334
crossref_primary_10_1007_s10586_024_04478_4
crossref_primary_10_2478_amns_2024_0416
Cites_doi 10.1007/s11227-020-03150-z
10.1007/s11227-016-1949-7
10.1007/s11036-013-0489-0
10.1016/j.surg.2018.06.022
10.1016/j.clineuro.2020.105706
10.1007/s00521-018-3780-y
10.1109/TIT.2019.2924621
10.1016/j.knosys.2019.06.032
10.1007/s11227-017-2019-5
10.1007/s11227-019-03093-0
10.1016/j.jss.2016.11.037
10.1007/s00778-018-0534-5
10.1007/s11227-020-03256-4
10.1016/j.is.2017.11.006
10.1007/s10586-015-0426-z
10.1002/cpe.4109
10.1007/s11227-018-2643-8
10.1007/s11227-020-03190-5
10.1007/s11227-013-1050-4
10.1016/j.ins.2020.01.041
10.1371/journal.pone.0229936
10.1145/1327452.1327492
10.1007/s12652-020-01775-9
10.1007/s11227-016-1727-6
10.1109/TSC.2020.2966697
10.1109/TSC.2019.2904270
10.1007/s11227-019-03045-8
10.1016/j.neucom.2017.01.026
10.1007/s12652-018-1021-y
10.1016/j.brs.2019.08.004
10.1152/ajpcell.00192.2019
10.1126/science.aaa8415
10.1016/j.compchemeng.2019.04.003
10.1007/978-981-13-0514-6_28
10.1007/978-981-13-2354-6_16
10.1007/978-3-319-13021-7_9
10.1109/I-SMAC.2017.8058263
10.1007/978-981-15-0187-6_13
10.1007/978-981-15-0029-9_28
10.5753/wperformance.2016.9723
10.1109/ICOA.2018.8370593
ContentType Journal Article
Copyright Springer Science+Business Media, LLC, part of Springer Nature 2020
Springer Science+Business Media, LLC, part of Springer Nature 2020.
Copyright_xml – notice: Springer Science+Business Media, LLC, part of Springer Nature 2020
– notice: Springer Science+Business Media, LLC, part of Springer Nature 2020.
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s11227-020-03328-5
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 1300
ExternalDocumentID 10_1007_s11227_020_03328_5
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
JQ2
ID FETCH-LOGICAL-c385t-dbb9cfd1b31bac6d36c74e4ebe5d4226dcadffb6e22c4eefa249a1aa01698cf3
IEDL.DBID RSV
ISICitedReferencesCount 66
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000532628500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0920-8542
IngestDate Thu Sep 25 00:51:20 EDT 2025
Sat Nov 29 04:27:39 EST 2025
Tue Nov 18 22:42:42 EST 2025
Fri Feb 21 02:49:09 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords Big data
Spark
Ganglia
Machine learning
Hadoop
MapReduce
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c385t-dbb9cfd1b31bac6d36c74e4ebe5d4226dcadffb6e22c4eefa249a1aa01698cf3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-5048-7775
PQID 2480786797
PQPubID 2043774
PageCount 28
ParticipantIDs proquest_journals_2480786797
crossref_citationtrail_10_1007_s11227_020_03328_5
crossref_primary_10_1007_s11227_020_03328_5
springer_journals_10_1007_s11227_020_03328_5
PublicationCentury 2000
PublicationDate 2021-02-01
PublicationDateYYYYMMDD 2021-02-01
PublicationDate_xml – month: 02
  year: 2021
  text: 2021-02-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2021
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Sandrini, Xu, Volochayev, Awosika, Wang, Butman, Cohen (CR53) 2020; 13
Wu, Zapevalova, Li, Zeng (CR10) 2018; 19
Kang, Lee (CR44) 2020
Jang, Jang, Kim, Yu (CR35) 2020; 519
Gopalani, Arora (CR15) 2015; 113
CR38
Xu, Wu, Wang, Zou (CR31) 2020; 11
CR37
CR34
Masarat, Sharifian, Taheri (CR49) 2016; 72
Harrington (CR48) 2012
Jordan, Mitchell (CR9) 2015; 349
Javanmardi, Yaghoubyan, Bagherifard (CR41) 2020
Dean, Ghemawat (CR3) 2008; 51
Mavridis, Karatza (CR19) 2017; 125
Li, Chen, Wang (CR23) 2019; 65
Nguyen, Won, Son, Gil, Moon (CR40) 2019; 75
Chen, Mao, Liu (CR1) 2014; 19
CR6
Alnasir, Shanahan (CR29) 2020; 21
CR47
Lee, Lee (CR52) 2020; 76
Glushkova, Jovanovic, Abelló (CR25) 2019; 79
Wei, He, Li, Shang, Li (CR33) 2020; 32
Li, Eldawy, Xue, Knorozova, Mokbel, Janardan (CR22) 2019; 28
Qin, Chiang (CR8) 2019; 126
Guo, Jiang, Lin, Li (CR42) 2020; 76
Kowalski, Lindberg, Fowler, Simasko, Peters (CR55) 2020; 318
Park, Park, Myaeng, Kang (CR30) 2020; 15
Cheng, Yang (CR43) 2019; 75
Massie, Li, Nicholes, Vuksan, Alexander, Buchbinder, Costa, Dean, Josephsen, Phaal, Pocock (CR46) 2012
Won, Nguyen, Gil, Moon, Whang (CR51) 2017; 73
Chen, Hu, Fan, Shen, Zhang, Liu (CR36) 2020; 187
Lai, Chen, Wu, Obaidat (CR50) 2014; 68
CR18
CR17
Liu, Wang, Zhou, Li (CR24) 2020
Zhou, Pan, Wang, Vasilakos (CR11) 2017; 237
CR16
CR14
CR13
Wang, Li, Ruiz, Yang, Chu (CR32) 2020
Wu, Zapevalova, Chen, Zeng, Liu (CR2) 2018; 116
Tang, Liu, Xiao, Yang, Xiao (CR5) 2017; 29
Cobb, Benjamin, Huang, Kuo (CR7) 2018; 164
Russell, Norvig (CR12) 2016
Xiao, Hu (CR45) 2020
CR27
Tang, Jiang, Yang, Li, Li (CR4) 2015; 18
CR26
Hussain, Surendran (CR39) 2020
Kumar, Bhavanam, Reddy (CR28) 2020; 24
CR21
CR20
Jiang, Fu, Chen, Zhan, Wang, Wei, Xiao (CR54) 2020; 192
SJ Russell (3328_CR12) 2016
AN Cobb (3328_CR7) 2018; 164
C Wu (3328_CR2) 2018; 116
3328_CR18
L Zhou (3328_CR11) 2017; 237
3328_CR17
3328_CR16
JJ Alnasir (3328_CR29) 2020; 21
WK Lai (3328_CR50) 2014; 68
3328_CR14
3328_CR13
A Guo (3328_CR42) 2020; 76
D Glushkova (3328_CR25) 2019; 79
DM Hussain (3328_CR39) 2020
MC Nguyen (3328_CR40) 2019; 75
Z Tang (3328_CR5) 2017; 29
S Jang (3328_CR35) 2020; 519
S Gopalani (3328_CR15) 2015; 113
MI Jordan (3328_CR9) 2015; 349
F Cheng (3328_CR43) 2019; 75
M Kang (3328_CR44) 2020
3328_CR27
3328_CR26
3328_CR21
Y Xu (3328_CR31) 2020; 11
3328_CR20
W Xiao (3328_CR45) 2020
P Wei (3328_CR33) 2020; 32
M Massie (3328_CR46) 2012
J Liu (3328_CR24) 2020
S Masarat (3328_CR49) 2016; 72
M Sandrini (3328_CR53) 2020; 13
3328_CR6
SJ Qin (3328_CR8) 2019; 126
Y Li (3328_CR22) 2019; 28
AK Javanmardi (3328_CR41) 2020
Z Tang (3328_CR4) 2015; 18
3328_CR38
3328_CR37
M Chen (3328_CR1) 2014; 19
3328_CR34
CW Kowalski (3328_CR55) 2020; 318
Y Chen (3328_CR36) 2020; 187
I Mavridis (3328_CR19) 2017; 125
H Won (3328_CR51) 2017; 73
ZJ Lee (3328_CR52) 2020; 76
C Wu (3328_CR10) 2018; 19
HM Park (3328_CR30) 2020; 15
J Dean (3328_CR3) 2008; 51
DK Kumar (3328_CR28) 2020; 24
J Wang (3328_CR32) 2020
3328_CR47
F Li (3328_CR23) 2019; 65
W Jiang (3328_CR54) 2020; 192
P Harrington (3328_CR48) 2012
References_xml – year: 2020
  ident: CR44
  article-title: Effect of garbage collection in iterative algorithms on Spark: an experimental analysis
  publication-title: J Supercomput
  doi: 10.1007/s11227-020-03150-z
– volume: 73
  start-page: 2657
  issue: 6
  year: 2017
  end-page: 2681
  ident: CR51
  article-title: Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS
  publication-title: J Supercomput
  doi: 10.1007/s11227-016-1949-7
– volume: 113
  start-page: 8
  issue: 1
  year: 2015
  end-page: 11
  ident: CR15
  article-title: Comparing apache spark and map reduce with performance analysis using k-means
  publication-title: Int J Comput Appl
– year: 2016
  ident: CR12
  publication-title: Artificial intelligence: a modern approach
– volume: 19
  start-page: 171
  issue: 2
  year: 2014
  end-page: 209
  ident: CR1
  article-title: Big data: a survey
  publication-title: Mob Netw Appl
  doi: 10.1007/s11036-013-0489-0
– volume: 164
  start-page: 640
  issue: 4
  year: 2018
  end-page: 642
  ident: CR7
  article-title: Big data: more than big data sets
  publication-title: Surgery
  doi: 10.1016/j.surg.2018.06.022
– ident: CR16
– volume: 192
  start-page: 105706
  year: 2020
  ident: CR54
  article-title: Basal ganglia infarction after mild head trauma in pediatric patients with basal ganglia calcification
  publication-title: Clin Neurol Neurosurg
  doi: 10.1016/j.clineuro.2020.105706
– year: 2012
  ident: CR48
  publication-title: Machine learning in action
– volume: 32
  start-page: 93
  issue: 1
  year: 2020
  end-page: 99
  ident: CR33
  article-title: Research on large data set clustering method based on MapReduce
  publication-title: Neural Comput Appl
  doi: 10.1007/s00521-018-3780-y
– volume: 65
  start-page: 6101
  issue: 10
  year: 2019
  end-page: 6114
  ident: CR23
  article-title: Wireless MapReduce distributed computing
  publication-title: IEEE Trans Inf Theory
  doi: 10.1109/TIT.2019.2924621
– volume: 187
  start-page: 104824
  year: 2020
  ident: CR36
  article-title: Fast density peak clustering for large scale data based on kNN
  publication-title: Knowl-Based Syst
  doi: 10.1016/j.knosys.2019.06.032
– volume: 75
  start-page: 533
  issue: 2
  year: 2019
  end-page: 553
  ident: CR40
  article-title: Prefetching-based metadata management in advanced multitenant Hadoop
  publication-title: J Supercomput
  doi: 10.1007/s11227-017-2019-5
– volume: 76
  start-page: 1049
  issue: 2
  year: 2020
  end-page: 1062
  ident: CR52
  article-title: A parallel intelligent algorithm applied to predict students dropping out of university
  publication-title: J Supercomput
  doi: 10.1007/s11227-019-03093-0
– volume: 125
  start-page: 133
  year: 2017
  end-page: 151
  ident: CR19
  article-title: Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark
  publication-title: J Syst Softw
  doi: 10.1016/j.jss.2016.11.037
– ident: CR21
– volume: 28
  start-page: 523
  issue: 4
  year: 2019
  end-page: 548
  ident: CR22
  article-title: Scalable computational geometry in MapReduce
  publication-title: VLDB J
  doi: 10.1007/s00778-018-0534-5
– volume: 19
  start-page: 581
  issue: 2
  year: 2018
  end-page: 590
  ident: CR10
  article-title: Knowledge structure and its impact on knowledge transfer in the big data environment
  publication-title: J Internet Technol
– year: 2020
  ident: CR41
  article-title: A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems
  publication-title: J Supercomput
  doi: 10.1007/s11227-020-03256-4
– volume: 79
  start-page: 32
  year: 2019
  end-page: 43
  ident: CR25
  article-title: Mapreduce performance model for Hadoop 2.x
  publication-title: Inf Syst
  doi: 10.1016/j.is.2017.11.006
– volume: 116
  start-page: 89
  issue: 1
  year: 2018
  end-page: 107
  ident: CR2
  article-title: Optimal model of continuous knowledge transfer in the big data environment
  publication-title: Computr Model Eng Sci
– volume: 18
  start-page: 493
  issue: 2
  year: 2015
  end-page: 505
  ident: CR4
  article-title: CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework
  publication-title: Clust Comput
  doi: 10.1007/s10586-015-0426-z
– volume: 29
  start-page: e4109
  issue: 20
  year: 2017
  ident: CR5
  article-title: A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce
  publication-title: Concurr Comput Pract Exp
  doi: 10.1002/cpe.4109
– volume: 75
  start-page: 2497
  issue: 5
  year: 2019
  end-page: 2517
  ident: CR43
  article-title: FastMFDs: a fast, efficient algorithm for mining minimal functional dependencies from large-scale distributed data with Spark
  publication-title: J Supercomput
  doi: 10.1007/s11227-018-2643-8
– year: 2020
  ident: CR45
  article-title: SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
  publication-title: J Supercomput
  doi: 10.1007/s11227-020-03190-5
– volume: 68
  start-page: 488
  issue: 1
  year: 2014
  end-page: 507
  ident: CR50
  article-title: Towards a framework for large-scale multimedia data storage and processing on Hadoop platform
  publication-title: J Supercomput
  doi: 10.1007/s11227-013-1050-4
– ident: CR26
– volume: 519
  start-page: 229
  year: 2020
  end-page: 242
  ident: CR35
  article-title: Input initialization for inversion of neural networks using k-nearest neighbor approach
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2020.01.041
– ident: CR18
– ident: CR47
– volume: 15
  start-page: e0229936
  issue: 3
  year: 2020
  ident: CR30
  article-title: PACC: large scale connected component computation on Hadoop and Spark
  publication-title: PLoS ONE
  doi: 10.1371/journal.pone.0229936
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  end-page: 113
  ident: CR3
  article-title: MapReduce: simplified data processing on large clusters
  publication-title: Commun ACM
  doi: 10.1145/1327452.1327492
– ident: CR14
– ident: CR37
– year: 2020
  ident: CR39
  article-title: The efficient fast-response content-based image retrieval using spark and MapReduce model framework
  publication-title: J Ambient Intell Humaniz Comput
  doi: 10.1007/s12652-020-01775-9
– year: 2012
  ident: CR46
  publication-title: Monitoring with Ganglia: tracking dynamic host and application metrics at scale
– volume: 72
  start-page: 2235
  issue: 6
  year: 2016
  end-page: 2258
  ident: CR49
  article-title: Modified parallel random forest for intrusion detection systems
  publication-title: J Supercomput
  doi: 10.1007/s11227-016-1727-6
– ident: CR6
– volume: 21
  start-page: 96
  issue: 1
  year: 2020
  end-page: 105
  ident: CR29
  article-title: The application of hadoop in structural bioinformatics
  publication-title: Brief Bioinform
– year: 2020
  ident: CR32
  article-title: Energy utilization task scheduling for MapReduce in heterogeneous clusters
  publication-title: IEEE Trans Serv Comput
  doi: 10.1109/TSC.2020.2966697
– year: 2020
  ident: CR24
  article-title: McTAR: a multi-trigger check pointing tactic for fast task recovery in MapReduce
  publication-title: IEEE Trans Serv Comput
  doi: 10.1109/TSC.2019.2904270
– ident: CR27
– volume: 24
  start-page: 1058
  issue: 4
  year: 2020
  end-page: 1062
  ident: CR28
  article-title: Usage of HIVE tool in Hadoop ECO system with loading data and user defined functions
  publication-title: Int J Psychosoc Rehabil
– volume: 76
  start-page: 932
  issue: 2
  year: 2020
  end-page: 947
  ident: CR42
  article-title: Data mining algorithms for bridge health monitoring: Kohonen clustering and LSTM prediction approaches
  publication-title: J Supercomput
  doi: 10.1007/s11227-019-03045-8
– volume: 237
  start-page: 350
  year: 2017
  end-page: 361
  ident: CR11
  article-title: Machine learning on big data: opportunities and challenges
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2017.01.026
– ident: CR38
– ident: CR17
– ident: CR13
– volume: 11
  start-page: 1047
  issue: 3
  year: 2020
  end-page: 1053
  ident: CR31
  article-title: Design and implementation of distributed RSA algorithm based on Hadoop
  publication-title: J Ambient Intell Humaniz Comput
  doi: 10.1007/s12652-018-1021-y
– volume: 13
  start-page: 96
  issue: 1
  year: 2020
  end-page: 104
  ident: CR53
  article-title: Transcranial direct current stimulation facilitates response inhibition through dynamic modulation of the fronto-basal ganglia network
  publication-title: Brain Stimul
  doi: 10.1016/j.brs.2019.08.004
– ident: CR34
– volume: 318
  start-page: C787
  year: 2020
  end-page: C796
  ident: CR55
  article-title: Contributing mechanisms underlying desensitization of CCK-induced activation of primary nodose ganglia neurons
  publication-title: Am J Physiol Cell Physiol
  doi: 10.1152/ajpcell.00192.2019
– volume: 349
  start-page: 255
  issue: 6245
  year: 2015
  end-page: 260
  ident: CR9
  article-title: Machine learning: trends, perspectives, and prospects
  publication-title: Science
  doi: 10.1126/science.aaa8415
– ident: CR20
– volume: 126
  start-page: 465
  year: 2019
  end-page: 473
  ident: CR8
  article-title: Advances and opportunities in machine learning for process data analytics
  publication-title: Comput Chem Eng
  doi: 10.1016/j.compchemeng.2019.04.003
– year: 2020
  ident: 3328_CR39
  publication-title: J Ambient Intell Humaniz Comput
  doi: 10.1007/s12652-020-01775-9
– volume: 192
  start-page: 105706
  year: 2020
  ident: 3328_CR54
  publication-title: Clin Neurol Neurosurg
  doi: 10.1016/j.clineuro.2020.105706
– ident: 3328_CR27
– volume: 76
  start-page: 1049
  issue: 2
  year: 2020
  ident: 3328_CR52
  publication-title: J Supercomput
  doi: 10.1007/s11227-019-03093-0
– ident: 3328_CR20
– ident: 3328_CR18
– ident: 3328_CR47
– volume: 519
  start-page: 229
  year: 2020
  ident: 3328_CR35
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2020.01.041
– volume: 76
  start-page: 932
  issue: 2
  year: 2020
  ident: 3328_CR42
  publication-title: J Supercomput
  doi: 10.1007/s11227-019-03045-8
– volume: 75
  start-page: 2497
  issue: 5
  year: 2019
  ident: 3328_CR43
  publication-title: J Supercomput
  doi: 10.1007/s11227-018-2643-8
– volume: 164
  start-page: 640
  issue: 4
  year: 2018
  ident: 3328_CR7
  publication-title: Surgery
  doi: 10.1016/j.surg.2018.06.022
– volume: 68
  start-page: 488
  issue: 1
  year: 2014
  ident: 3328_CR50
  publication-title: J Supercomput
  doi: 10.1007/s11227-013-1050-4
– ident: 3328_CR21
  doi: 10.1007/978-981-13-0514-6_28
– volume: 75
  start-page: 533
  issue: 2
  year: 2019
  ident: 3328_CR40
  publication-title: J Supercomput
  doi: 10.1007/s11227-017-2019-5
– volume: 19
  start-page: 171
  issue: 2
  year: 2014
  ident: 3328_CR1
  publication-title: Mob Netw Appl
  doi: 10.1007/s11036-013-0489-0
– volume: 126
  start-page: 465
  year: 2019
  ident: 3328_CR8
  publication-title: Comput Chem Eng
  doi: 10.1016/j.compchemeng.2019.04.003
– ident: 3328_CR6
– volume: 28
  start-page: 523
  issue: 4
  year: 2019
  ident: 3328_CR22
  publication-title: VLDB J
  doi: 10.1007/s00778-018-0534-5
– ident: 3328_CR26
  doi: 10.1007/978-981-13-2354-6_16
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  ident: 3328_CR3
  publication-title: Commun ACM
  doi: 10.1145/1327452.1327492
– ident: 3328_CR17
  doi: 10.1007/978-3-319-13021-7_9
– ident: 3328_CR14
  doi: 10.1109/I-SMAC.2017.8058263
– volume: 113
  start-page: 8
  issue: 1
  year: 2015
  ident: 3328_CR15
  publication-title: Int J Comput Appl
– ident: 3328_CR38
  doi: 10.1007/978-981-15-0187-6_13
– volume: 18
  start-page: 493
  issue: 2
  year: 2015
  ident: 3328_CR4
  publication-title: Clust Comput
  doi: 10.1007/s10586-015-0426-z
– year: 2020
  ident: 3328_CR24
  publication-title: IEEE Trans Serv Comput
  doi: 10.1109/TSC.2019.2904270
– volume: 116
  start-page: 89
  issue: 1
  year: 2018
  ident: 3328_CR2
  publication-title: Computr Model Eng Sci
– volume-title: Machine learning in action
  year: 2012
  ident: 3328_CR48
– volume: 13
  start-page: 96
  issue: 1
  year: 2020
  ident: 3328_CR53
  publication-title: Brain Stimul
  doi: 10.1016/j.brs.2019.08.004
– volume: 19
  start-page: 581
  issue: 2
  year: 2018
  ident: 3328_CR10
  publication-title: J Internet Technol
– volume-title: Artificial intelligence: a modern approach
  year: 2016
  ident: 3328_CR12
– volume: 21
  start-page: 96
  issue: 1
  year: 2020
  ident: 3328_CR29
  publication-title: Brief Bioinform
– volume: 29
  start-page: e4109
  issue: 20
  year: 2017
  ident: 3328_CR5
  publication-title: Concurr Comput Pract Exp
  doi: 10.1002/cpe.4109
– volume: 15
  start-page: e0229936
  issue: 3
  year: 2020
  ident: 3328_CR30
  publication-title: PLoS ONE
  doi: 10.1371/journal.pone.0229936
– volume: 349
  start-page: 255
  issue: 6245
  year: 2015
  ident: 3328_CR9
  publication-title: Science
  doi: 10.1126/science.aaa8415
– volume-title: Monitoring with Ganglia: tracking dynamic host and application metrics at scale
  year: 2012
  ident: 3328_CR46
– volume: 11
  start-page: 1047
  issue: 3
  year: 2020
  ident: 3328_CR31
  publication-title: J Ambient Intell Humaniz Comput
  doi: 10.1007/s12652-018-1021-y
– volume: 79
  start-page: 32
  year: 2019
  ident: 3328_CR25
  publication-title: Inf Syst
  doi: 10.1016/j.is.2017.11.006
– ident: 3328_CR37
  doi: 10.1007/978-981-15-0029-9_28
– ident: 3328_CR16
– volume: 73
  start-page: 2657
  issue: 6
  year: 2017
  ident: 3328_CR51
  publication-title: J Supercomput
  doi: 10.1007/s11227-016-1949-7
– volume: 24
  start-page: 1058
  issue: 4
  year: 2020
  ident: 3328_CR28
  publication-title: Int J Psychosoc Rehabil
– year: 2020
  ident: 3328_CR41
  publication-title: J Supercomput
  doi: 10.1007/s11227-020-03256-4
– year: 2020
  ident: 3328_CR45
  publication-title: J Supercomput
  doi: 10.1007/s11227-020-03190-5
– volume: 32
  start-page: 93
  issue: 1
  year: 2020
  ident: 3328_CR33
  publication-title: Neural Comput Appl
  doi: 10.1007/s00521-018-3780-y
– ident: 3328_CR34
  doi: 10.5753/wperformance.2016.9723
– volume: 125
  start-page: 133
  year: 2017
  ident: 3328_CR19
  publication-title: J Syst Softw
  doi: 10.1016/j.jss.2016.11.037
– volume: 65
  start-page: 6101
  issue: 10
  year: 2019
  ident: 3328_CR23
  publication-title: IEEE Trans Inf Theory
  doi: 10.1109/TIT.2019.2924621
– volume: 187
  start-page: 104824
  year: 2020
  ident: 3328_CR36
  publication-title: Knowl-Based Syst
  doi: 10.1016/j.knosys.2019.06.032
– volume: 237
  start-page: 350
  year: 2017
  ident: 3328_CR11
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2017.01.026
– volume: 318
  start-page: C787
  year: 2020
  ident: 3328_CR55
  publication-title: Am J Physiol Cell Physiol
  doi: 10.1152/ajpcell.00192.2019
– year: 2020
  ident: 3328_CR44
  publication-title: J Supercomput
  doi: 10.1007/s11227-020-03150-z
– year: 2020
  ident: 3328_CR32
  publication-title: IEEE Trans Serv Comput
  doi: 10.1109/TSC.2020.2966697
– volume: 72
  start-page: 2235
  issue: 6
  year: 2016
  ident: 3328_CR49
  publication-title: J Supercomput
  doi: 10.1007/s11227-016-1727-6
– ident: 3328_CR13
  doi: 10.1109/ICOA.2018.8370593
SSID ssj0004373
Score 2.485159
Snippet One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1273
SubjectTerms Algorithms
Big Data
Compilers
Computer Science
Data processing
Interpreters
Iterative algorithms
Iterative methods
K-nearest neighbors algorithm
Machine learning
Microprocessors
Processor Architectures
Programming Languages
Run time (computers)
Title Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
URI https://link.springer.com/article/10.1007/s11227-020-03328-5
https://www.proquest.com/docview/2480786797
Volume 77
WOSCitedRecordID wos000532628500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLink Journals
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA86PXhxfuJ0Sg7eNNCmaZseRRweZIgbY3gpaT7mcGtLW_37TbrUqqig57yG8F7y3kvzfr8HwLkkQSA910eRwxNEdMaNGBYcaXcpmRMqQepf2ZO7cDik02l0b0FhZVPt3jxJ1p66Bbu5GIfIXHccz8MU-etgQ4c7aho2PIwmLRrSW70rR1qS-gRbqMz3c3wOR22O-eVZtI42g-7_1rkDtm12Ca9W22EXrMl0D3Sbzg3QHuR98PiBXiOdQZ0EwrxFEMBMQe2RsiyHLBVwlLPiGeYLVhmBEmYpXNYlmBLanhMzyBazrJhXT8vyAIwHN-PrW2S7LCDuUb9CIkkiroSbeG7CeCC8gIdEEm1cXxiYreBMKJUEEmNOpFRMX9iYy5ihcaFceYegk2apPALQYUIa-jNtZ06UI2iix4VwRSBpiJOgB9xG1zG3DOSmEcYibrmTje5irbu41l3s98DF-zf5in_jV-l-Y8LYnsUyxgY1b3gFwx64bEzWDv882_HfxE_AFjYFL3VJdx90quJFnoJN_lrNy-Ks3qNvCTPhvw
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwED90CvritzidmgffNNCm6cceRRwT5xA3xvClpEk6h1tb2urfb9K1VkUFfc41hLvk7tLc73cAZ5I6jrRMG7cNHmCqMm7MiOBYuUvJDDcUtPiVPeq5_b43HrfvS1BYVlW7V0-ShaeuwW4mIS7W1x3DsoiH7WVYoSpiacb8h8GoRkNai3fltpL0bEpKqMz3c3wOR3WO-eVZtIg2nc3_rXMLNsrsEl0utsM2LMloBzarzg2oPMi78PiBXiOaIJUEoqRGEKA4RMojxXGCWCTQIGHpM0pmLNcCGYojNC9KMCUqe05MEJtN4nSaP82zPRh2rodXXVx2WcDc8uwciyBo81CYgWUGjDvCcrhLJVXGtYWG2QrORBgGjiSEUylDpi5szGRM07h4PLT2oRHFkTwAZDAhNf2ZsjOnoSG8QI0LYQpHei4JnCaYla59XjKQ60YYM7_mTta685Xu_EJ3vt2E8_dvkgX_xq_SrcqEfnkWM59o1LzmFXSbcFGZrB7-ebbDv4mfwlp3eNfzezf92yNYJ7r4pSjvbkEjT1_kMazy13yapSfFfn0Dpvrkow
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB60inixPrFadQ_edDHZbB49iloUSym0lOIlbPZRxZqENvr73U0TU0UF8byTZZnZx0xmvm8ATiX1POnYLm5ZPMJUe9yYEcGxvi4ls3wlaP4re9jxu91gNGr1FlD8ebV7mZKcYxoMS1OcXaRCXVTAN5sQH5vQx3IcEmB3GVaoKaQ38Xp_WCEjnXmOuaUlA5eSAjbz_Ryfn6bK3_ySIs1fnnb9_2vehI3C60SX822yBUsy3oZ62dEBFQd8Bx4WaDfiMdLOIUorZAFKFNI3VZKkiMUC9VM2fUbphGVGYIaSGL3kpZkSFb0oxohNxsn0KXt8me3CoH0zuLrFRfcFzJ3AzbCIohZXwo4cO2LcE47HfSqpNrorDPxWcCaUijxJCKdSKqYDOWYzZuhdAq6cPajFSSz3AVlMSEOLpu3PqbJEEOlxIWzhycAnkdcAu9R7yAtmctMgYxJWnMpGd6HWXZjrLnQbcPbxTTrn5fhVulmaMyzO6CwkBk1v-Ab9BpyX5quGf57t4G_iJ7DWu26Hnbvu_SGsE1MTk1d9N6GWTV_lEazyt-xpNj3Ot-47V6Xthw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Investigating+the+performance+of+Hadoop+and+Spark+platforms+on+machine+learning+algorithms&rft.jtitle=The+Journal+of+supercomputing&rft.au=Mostafaeipour%2C+Ali&rft.au=Jahangard+Rafsanjani%2C+Amir&rft.au=Ahmadi%2C+Mohammad&rft.au=Arockia+Dhanraj%2C+Joshuva&rft.date=2021-02-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=77&rft.issue=2&rft.spage=1273&rft.epage=1300&rft_id=info:doi/10.1007%2Fs11227-020-03328-5&rft.externalDocID=10_1007_s11227_020_03328_5
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon