Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the...
Saved in:
| Published in: | The Journal of supercomputing Vol. 77; no. 2; pp. 1273 - 1300 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.02.2021
Springer Nature B.V |
| Subjects: | |
| ISSN: | 0920-8542, 1573-0484 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the in-memory programming model, Spark as an open-source framework is suitable for processing iterative algorithms. In this paper, Hadoop and Spark frameworks, the big data processing platforms, are evaluated and compared in terms of runtime, memory and network usage, and central processor efficiency. Hence, the K-nearest neighbor (KNN) algorithm is implemented on datasets with different sizes within both Hadoop and Spark frameworks. The results show that the runtime of the KNN algorithm implemented on Spark is 4 to 4.5 times faster than Hadoop. Evaluations show that Hadoop uses more sources, including central processor and network. It is concluded that the CPU in Spark is more effective than Hadoop. On the other hand, the memory usage in Hadoop is less than Spark. |
|---|---|
| AbstractList | One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the in-memory programming model, Spark as an open-source framework is suitable for processing iterative algorithms. In this paper, Hadoop and Spark frameworks, the big data processing platforms, are evaluated and compared in terms of runtime, memory and network usage, and central processor efficiency. Hence, the K-nearest neighbor (KNN) algorithm is implemented on datasets with different sizes within both Hadoop and Spark frameworks. The results show that the runtime of the KNN algorithm implemented on Spark is 4 to 4.5 times faster than Hadoop. Evaluations show that Hadoop uses more sources, including central processor and network. It is concluded that the CPU in Spark is more effective than Hadoop. On the other hand, the memory usage in Hadoop is less than Spark. |
| Author | Ahmadi, Mohammad Jahangard Rafsanjani, Amir Mostafaeipour, Ali Arockia Dhanraj, Joshuva |
| Author_xml | – sequence: 1 givenname: Ali surname: Mostafaeipour fullname: Mostafaeipour, Ali organization: Industrial Engineering Department, Yazd University – sequence: 2 givenname: Amir surname: Jahangard Rafsanjani fullname: Jahangard Rafsanjani, Amir organization: Computer Engineering Department, Yazd University – sequence: 3 givenname: Mohammad surname: Ahmadi fullname: Ahmadi, Mohammad organization: Computer Engineering Department, Yazd University – sequence: 4 givenname: Joshuva orcidid: 0000-0001-5048-7775 surname: Arockia Dhanraj fullname: Arockia Dhanraj, Joshuva email: joshuva1991@gmail.com organization: Centre for Automation and Robotics (ANRO), Department of Mechanical Engineering, Hindustan Institute of Technology and Science |
| BookMark | eNp9kD9PwzAQxS1UJNrCF2CyxBzwvyTOiCqgSJUY6MRiXRw7TUnsYKdIfHtSgoTE0OmGe793994CzZx3BqFrSm4pIfldpJSxPCGMJIRzJpP0DM1pmvOECClmaE6KcSVTwS7QIsY9IUTwnM_R27P7NHFoahgaV-NhZ3BvgvWhA6cN9havofK-x-Aq_NpDeMd9C8NRELF3uAO9a5zBrYHgjg7Q1j40w66Ll-jcQhvN1e9cou3jw3a1TjYvT8-r-02iuUyHpCrLQtuKlpyWoLOKZzoXRpjSpJVgLKs0VNaWmWFMC2MsMFEABSA0K6S2fIluJts--I_DmEXt_SG48aJiQpJcZnmRjyo5qXTwMQZjlW6GMbN3Q4CmVZSoY5FqKlKNRaqfIlU6ouwf2oemg_B1GuITFEexq034--oE9Q3YMope |
| CitedBy_id | crossref_primary_10_1007_s11227_021_04072_0 crossref_primary_10_1016_j_measurement_2022_111174 crossref_primary_10_1109_ACCESS_2023_3307512 crossref_primary_10_1155_2022_7660071 crossref_primary_10_3390_bdcc5020021 crossref_primary_10_3390_bdcc6020038 crossref_primary_10_1007_s11227_022_04381_y crossref_primary_10_3390_risks11080145 crossref_primary_10_3390_app14010452 crossref_primary_10_1007_s11042_023_17330_5 crossref_primary_10_1007_s41060_025_00753_8 crossref_primary_10_1063_5_0191442 crossref_primary_10_3390_info14020093 crossref_primary_10_1007_s11042_023_14562_3 crossref_primary_10_2478_ttj_2020_0023 crossref_primary_10_1155_2022_7861756 crossref_primary_10_3724_SP_J_1249_2025_03317 crossref_primary_10_1155_2022_3443182 crossref_primary_10_1109_ACCESS_2023_3262989 crossref_primary_10_1186_s40537_022_00623_1 crossref_primary_10_2478_amns_2024_1956 crossref_primary_10_3390_en16114446 crossref_primary_10_1155_2022_9095330 crossref_primary_10_1007_s11227_023_05443_5 crossref_primary_10_1093_comjnl_bxad017 crossref_primary_10_1108_DTA_06_2021_0153 crossref_primary_10_3390_info15040178 crossref_primary_10_1016_j_sasc_2024_200096 crossref_primary_10_1007_s11042_023_17932_z crossref_primary_10_3233_IDA_226774 crossref_primary_10_3390_ijgi10110763 crossref_primary_10_1108_IJICC_10_2020_0157 crossref_primary_10_1109_ACCESS_2022_3226334 crossref_primary_10_1007_s10586_024_04478_4 crossref_primary_10_2478_amns_2024_0416 |
| Cites_doi | 10.1007/s11227-020-03150-z 10.1007/s11227-016-1949-7 10.1007/s11036-013-0489-0 10.1016/j.surg.2018.06.022 10.1016/j.clineuro.2020.105706 10.1007/s00521-018-3780-y 10.1109/TIT.2019.2924621 10.1016/j.knosys.2019.06.032 10.1007/s11227-017-2019-5 10.1007/s11227-019-03093-0 10.1016/j.jss.2016.11.037 10.1007/s00778-018-0534-5 10.1007/s11227-020-03256-4 10.1016/j.is.2017.11.006 10.1007/s10586-015-0426-z 10.1002/cpe.4109 10.1007/s11227-018-2643-8 10.1007/s11227-020-03190-5 10.1007/s11227-013-1050-4 10.1016/j.ins.2020.01.041 10.1371/journal.pone.0229936 10.1145/1327452.1327492 10.1007/s12652-020-01775-9 10.1007/s11227-016-1727-6 10.1109/TSC.2020.2966697 10.1109/TSC.2019.2904270 10.1007/s11227-019-03045-8 10.1016/j.neucom.2017.01.026 10.1007/s12652-018-1021-y 10.1016/j.brs.2019.08.004 10.1152/ajpcell.00192.2019 10.1126/science.aaa8415 10.1016/j.compchemeng.2019.04.003 10.1007/978-981-13-0514-6_28 10.1007/978-981-13-2354-6_16 10.1007/978-3-319-13021-7_9 10.1109/I-SMAC.2017.8058263 10.1007/978-981-15-0187-6_13 10.1007/978-981-15-0029-9_28 10.5753/wperformance.2016.9723 10.1109/ICOA.2018.8370593 |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media, LLC, part of Springer Nature 2020 Springer Science+Business Media, LLC, part of Springer Nature 2020. |
| Copyright_xml | – notice: Springer Science+Business Media, LLC, part of Springer Nature 2020 – notice: Springer Science+Business Media, LLC, part of Springer Nature 2020. |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s11227-020-03328-5 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0484 |
| EndPage | 1300 |
| ExternalDocumentID | 10_1007_s11227_020_03328_5 |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS JQ2 |
| ID | FETCH-LOGICAL-c385t-dbb9cfd1b31bac6d36c74e4ebe5d4226dcadffb6e22c4eefa249a1aa01698cf3 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 66 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000532628500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0920-8542 |
| IngestDate | Thu Sep 25 00:51:20 EDT 2025 Sat Nov 29 04:27:39 EST 2025 Tue Nov 18 22:42:42 EST 2025 Fri Feb 21 02:49:09 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | Big data Spark Ganglia Machine learning Hadoop MapReduce |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c385t-dbb9cfd1b31bac6d36c74e4ebe5d4226dcadffb6e22c4eefa249a1aa01698cf3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-5048-7775 |
| PQID | 2480786797 |
| PQPubID | 2043774 |
| PageCount | 28 |
| ParticipantIDs | proquest_journals_2480786797 crossref_citationtrail_10_1007_s11227_020_03328_5 crossref_primary_10_1007_s11227_020_03328_5 springer_journals_10_1007_s11227_020_03328_5 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-02-01 |
| PublicationDateYYYYMMDD | 2021-02-01 |
| PublicationDate_xml | – month: 02 year: 2021 text: 2021-02-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationSubtitle | An International Journal of High-Performance Computer Design, Analysis, and Use |
| PublicationTitle | The Journal of supercomputing |
| PublicationTitleAbbrev | J Supercomput |
| PublicationYear | 2021 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Sandrini, Xu, Volochayev, Awosika, Wang, Butman, Cohen (CR53) 2020; 13 Wu, Zapevalova, Li, Zeng (CR10) 2018; 19 Kang, Lee (CR44) 2020 Jang, Jang, Kim, Yu (CR35) 2020; 519 Gopalani, Arora (CR15) 2015; 113 CR38 Xu, Wu, Wang, Zou (CR31) 2020; 11 CR37 CR34 Masarat, Sharifian, Taheri (CR49) 2016; 72 Harrington (CR48) 2012 Jordan, Mitchell (CR9) 2015; 349 Javanmardi, Yaghoubyan, Bagherifard (CR41) 2020 Dean, Ghemawat (CR3) 2008; 51 Mavridis, Karatza (CR19) 2017; 125 Li, Chen, Wang (CR23) 2019; 65 Nguyen, Won, Son, Gil, Moon (CR40) 2019; 75 Chen, Mao, Liu (CR1) 2014; 19 CR6 Alnasir, Shanahan (CR29) 2020; 21 CR47 Lee, Lee (CR52) 2020; 76 Glushkova, Jovanovic, Abelló (CR25) 2019; 79 Wei, He, Li, Shang, Li (CR33) 2020; 32 Li, Eldawy, Xue, Knorozova, Mokbel, Janardan (CR22) 2019; 28 Qin, Chiang (CR8) 2019; 126 Guo, Jiang, Lin, Li (CR42) 2020; 76 Kowalski, Lindberg, Fowler, Simasko, Peters (CR55) 2020; 318 Park, Park, Myaeng, Kang (CR30) 2020; 15 Cheng, Yang (CR43) 2019; 75 Massie, Li, Nicholes, Vuksan, Alexander, Buchbinder, Costa, Dean, Josephsen, Phaal, Pocock (CR46) 2012 Won, Nguyen, Gil, Moon, Whang (CR51) 2017; 73 Chen, Hu, Fan, Shen, Zhang, Liu (CR36) 2020; 187 Lai, Chen, Wu, Obaidat (CR50) 2014; 68 CR18 CR17 Liu, Wang, Zhou, Li (CR24) 2020 Zhou, Pan, Wang, Vasilakos (CR11) 2017; 237 CR16 CR14 CR13 Wang, Li, Ruiz, Yang, Chu (CR32) 2020 Wu, Zapevalova, Chen, Zeng, Liu (CR2) 2018; 116 Tang, Liu, Xiao, Yang, Xiao (CR5) 2017; 29 Cobb, Benjamin, Huang, Kuo (CR7) 2018; 164 Russell, Norvig (CR12) 2016 Xiao, Hu (CR45) 2020 CR27 Tang, Jiang, Yang, Li, Li (CR4) 2015; 18 CR26 Hussain, Surendran (CR39) 2020 Kumar, Bhavanam, Reddy (CR28) 2020; 24 CR21 CR20 Jiang, Fu, Chen, Zhan, Wang, Wei, Xiao (CR54) 2020; 192 SJ Russell (3328_CR12) 2016 AN Cobb (3328_CR7) 2018; 164 C Wu (3328_CR2) 2018; 116 3328_CR18 L Zhou (3328_CR11) 2017; 237 3328_CR17 3328_CR16 JJ Alnasir (3328_CR29) 2020; 21 WK Lai (3328_CR50) 2014; 68 3328_CR14 3328_CR13 A Guo (3328_CR42) 2020; 76 D Glushkova (3328_CR25) 2019; 79 DM Hussain (3328_CR39) 2020 MC Nguyen (3328_CR40) 2019; 75 Z Tang (3328_CR5) 2017; 29 S Jang (3328_CR35) 2020; 519 S Gopalani (3328_CR15) 2015; 113 MI Jordan (3328_CR9) 2015; 349 F Cheng (3328_CR43) 2019; 75 M Kang (3328_CR44) 2020 3328_CR27 3328_CR26 3328_CR21 Y Xu (3328_CR31) 2020; 11 3328_CR20 W Xiao (3328_CR45) 2020 P Wei (3328_CR33) 2020; 32 M Massie (3328_CR46) 2012 J Liu (3328_CR24) 2020 S Masarat (3328_CR49) 2016; 72 M Sandrini (3328_CR53) 2020; 13 3328_CR6 SJ Qin (3328_CR8) 2019; 126 Y Li (3328_CR22) 2019; 28 AK Javanmardi (3328_CR41) 2020 Z Tang (3328_CR4) 2015; 18 3328_CR38 3328_CR37 M Chen (3328_CR1) 2014; 19 3328_CR34 CW Kowalski (3328_CR55) 2020; 318 Y Chen (3328_CR36) 2020; 187 I Mavridis (3328_CR19) 2017; 125 H Won (3328_CR51) 2017; 73 ZJ Lee (3328_CR52) 2020; 76 C Wu (3328_CR10) 2018; 19 HM Park (3328_CR30) 2020; 15 J Dean (3328_CR3) 2008; 51 DK Kumar (3328_CR28) 2020; 24 J Wang (3328_CR32) 2020 3328_CR47 F Li (3328_CR23) 2019; 65 W Jiang (3328_CR54) 2020; 192 P Harrington (3328_CR48) 2012 |
| References_xml | – year: 2020 ident: CR44 article-title: Effect of garbage collection in iterative algorithms on Spark: an experimental analysis publication-title: J Supercomput doi: 10.1007/s11227-020-03150-z – volume: 73 start-page: 2657 issue: 6 year: 2017 end-page: 2681 ident: CR51 article-title: Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS publication-title: J Supercomput doi: 10.1007/s11227-016-1949-7 – volume: 113 start-page: 8 issue: 1 year: 2015 end-page: 11 ident: CR15 article-title: Comparing apache spark and map reduce with performance analysis using k-means publication-title: Int J Comput Appl – year: 2016 ident: CR12 publication-title: Artificial intelligence: a modern approach – volume: 19 start-page: 171 issue: 2 year: 2014 end-page: 209 ident: CR1 article-title: Big data: a survey publication-title: Mob Netw Appl doi: 10.1007/s11036-013-0489-0 – volume: 164 start-page: 640 issue: 4 year: 2018 end-page: 642 ident: CR7 article-title: Big data: more than big data sets publication-title: Surgery doi: 10.1016/j.surg.2018.06.022 – ident: CR16 – volume: 192 start-page: 105706 year: 2020 ident: CR54 article-title: Basal ganglia infarction after mild head trauma in pediatric patients with basal ganglia calcification publication-title: Clin Neurol Neurosurg doi: 10.1016/j.clineuro.2020.105706 – year: 2012 ident: CR48 publication-title: Machine learning in action – volume: 32 start-page: 93 issue: 1 year: 2020 end-page: 99 ident: CR33 article-title: Research on large data set clustering method based on MapReduce publication-title: Neural Comput Appl doi: 10.1007/s00521-018-3780-y – volume: 65 start-page: 6101 issue: 10 year: 2019 end-page: 6114 ident: CR23 article-title: Wireless MapReduce distributed computing publication-title: IEEE Trans Inf Theory doi: 10.1109/TIT.2019.2924621 – volume: 187 start-page: 104824 year: 2020 ident: CR36 article-title: Fast density peak clustering for large scale data based on kNN publication-title: Knowl-Based Syst doi: 10.1016/j.knosys.2019.06.032 – volume: 75 start-page: 533 issue: 2 year: 2019 end-page: 553 ident: CR40 article-title: Prefetching-based metadata management in advanced multitenant Hadoop publication-title: J Supercomput doi: 10.1007/s11227-017-2019-5 – volume: 76 start-page: 1049 issue: 2 year: 2020 end-page: 1062 ident: CR52 article-title: A parallel intelligent algorithm applied to predict students dropping out of university publication-title: J Supercomput doi: 10.1007/s11227-019-03093-0 – volume: 125 start-page: 133 year: 2017 end-page: 151 ident: CR19 article-title: Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark publication-title: J Syst Softw doi: 10.1016/j.jss.2016.11.037 – ident: CR21 – volume: 28 start-page: 523 issue: 4 year: 2019 end-page: 548 ident: CR22 article-title: Scalable computational geometry in MapReduce publication-title: VLDB J doi: 10.1007/s00778-018-0534-5 – volume: 19 start-page: 581 issue: 2 year: 2018 end-page: 590 ident: CR10 article-title: Knowledge structure and its impact on knowledge transfer in the big data environment publication-title: J Internet Technol – year: 2020 ident: CR41 article-title: A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems publication-title: J Supercomput doi: 10.1007/s11227-020-03256-4 – volume: 79 start-page: 32 year: 2019 end-page: 43 ident: CR25 article-title: Mapreduce performance model for Hadoop 2.x publication-title: Inf Syst doi: 10.1016/j.is.2017.11.006 – volume: 116 start-page: 89 issue: 1 year: 2018 end-page: 107 ident: CR2 article-title: Optimal model of continuous knowledge transfer in the big data environment publication-title: Computr Model Eng Sci – volume: 18 start-page: 493 issue: 2 year: 2015 end-page: 505 ident: CR4 article-title: CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework publication-title: Clust Comput doi: 10.1007/s10586-015-0426-z – volume: 29 start-page: e4109 issue: 20 year: 2017 ident: CR5 article-title: A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce publication-title: Concurr Comput Pract Exp doi: 10.1002/cpe.4109 – volume: 75 start-page: 2497 issue: 5 year: 2019 end-page: 2517 ident: CR43 article-title: FastMFDs: a fast, efficient algorithm for mining minimal functional dependencies from large-scale distributed data with Spark publication-title: J Supercomput doi: 10.1007/s11227-018-2643-8 – year: 2020 ident: CR45 article-title: SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming publication-title: J Supercomput doi: 10.1007/s11227-020-03190-5 – volume: 68 start-page: 488 issue: 1 year: 2014 end-page: 507 ident: CR50 article-title: Towards a framework for large-scale multimedia data storage and processing on Hadoop platform publication-title: J Supercomput doi: 10.1007/s11227-013-1050-4 – ident: CR26 – volume: 519 start-page: 229 year: 2020 end-page: 242 ident: CR35 article-title: Input initialization for inversion of neural networks using k-nearest neighbor approach publication-title: Inf Sci doi: 10.1016/j.ins.2020.01.041 – ident: CR18 – ident: CR47 – volume: 15 start-page: e0229936 issue: 3 year: 2020 ident: CR30 article-title: PACC: large scale connected component computation on Hadoop and Spark publication-title: PLoS ONE doi: 10.1371/journal.pone.0229936 – volume: 51 start-page: 107 issue: 1 year: 2008 end-page: 113 ident: CR3 article-title: MapReduce: simplified data processing on large clusters publication-title: Commun ACM doi: 10.1145/1327452.1327492 – ident: CR14 – ident: CR37 – year: 2020 ident: CR39 article-title: The efficient fast-response content-based image retrieval using spark and MapReduce model framework publication-title: J Ambient Intell Humaniz Comput doi: 10.1007/s12652-020-01775-9 – year: 2012 ident: CR46 publication-title: Monitoring with Ganglia: tracking dynamic host and application metrics at scale – volume: 72 start-page: 2235 issue: 6 year: 2016 end-page: 2258 ident: CR49 article-title: Modified parallel random forest for intrusion detection systems publication-title: J Supercomput doi: 10.1007/s11227-016-1727-6 – ident: CR6 – volume: 21 start-page: 96 issue: 1 year: 2020 end-page: 105 ident: CR29 article-title: The application of hadoop in structural bioinformatics publication-title: Brief Bioinform – year: 2020 ident: CR32 article-title: Energy utilization task scheduling for MapReduce in heterogeneous clusters publication-title: IEEE Trans Serv Comput doi: 10.1109/TSC.2020.2966697 – year: 2020 ident: CR24 article-title: McTAR: a multi-trigger check pointing tactic for fast task recovery in MapReduce publication-title: IEEE Trans Serv Comput doi: 10.1109/TSC.2019.2904270 – ident: CR27 – volume: 24 start-page: 1058 issue: 4 year: 2020 end-page: 1062 ident: CR28 article-title: Usage of HIVE tool in Hadoop ECO system with loading data and user defined functions publication-title: Int J Psychosoc Rehabil – volume: 76 start-page: 932 issue: 2 year: 2020 end-page: 947 ident: CR42 article-title: Data mining algorithms for bridge health monitoring: Kohonen clustering and LSTM prediction approaches publication-title: J Supercomput doi: 10.1007/s11227-019-03045-8 – volume: 237 start-page: 350 year: 2017 end-page: 361 ident: CR11 article-title: Machine learning on big data: opportunities and challenges publication-title: Neurocomputing doi: 10.1016/j.neucom.2017.01.026 – ident: CR38 – ident: CR17 – ident: CR13 – volume: 11 start-page: 1047 issue: 3 year: 2020 end-page: 1053 ident: CR31 article-title: Design and implementation of distributed RSA algorithm based on Hadoop publication-title: J Ambient Intell Humaniz Comput doi: 10.1007/s12652-018-1021-y – volume: 13 start-page: 96 issue: 1 year: 2020 end-page: 104 ident: CR53 article-title: Transcranial direct current stimulation facilitates response inhibition through dynamic modulation of the fronto-basal ganglia network publication-title: Brain Stimul doi: 10.1016/j.brs.2019.08.004 – ident: CR34 – volume: 318 start-page: C787 year: 2020 end-page: C796 ident: CR55 article-title: Contributing mechanisms underlying desensitization of CCK-induced activation of primary nodose ganglia neurons publication-title: Am J Physiol Cell Physiol doi: 10.1152/ajpcell.00192.2019 – volume: 349 start-page: 255 issue: 6245 year: 2015 end-page: 260 ident: CR9 article-title: Machine learning: trends, perspectives, and prospects publication-title: Science doi: 10.1126/science.aaa8415 – ident: CR20 – volume: 126 start-page: 465 year: 2019 end-page: 473 ident: CR8 article-title: Advances and opportunities in machine learning for process data analytics publication-title: Comput Chem Eng doi: 10.1016/j.compchemeng.2019.04.003 – year: 2020 ident: 3328_CR39 publication-title: J Ambient Intell Humaniz Comput doi: 10.1007/s12652-020-01775-9 – volume: 192 start-page: 105706 year: 2020 ident: 3328_CR54 publication-title: Clin Neurol Neurosurg doi: 10.1016/j.clineuro.2020.105706 – ident: 3328_CR27 – volume: 76 start-page: 1049 issue: 2 year: 2020 ident: 3328_CR52 publication-title: J Supercomput doi: 10.1007/s11227-019-03093-0 – ident: 3328_CR20 – ident: 3328_CR18 – ident: 3328_CR47 – volume: 519 start-page: 229 year: 2020 ident: 3328_CR35 publication-title: Inf Sci doi: 10.1016/j.ins.2020.01.041 – volume: 76 start-page: 932 issue: 2 year: 2020 ident: 3328_CR42 publication-title: J Supercomput doi: 10.1007/s11227-019-03045-8 – volume: 75 start-page: 2497 issue: 5 year: 2019 ident: 3328_CR43 publication-title: J Supercomput doi: 10.1007/s11227-018-2643-8 – volume: 164 start-page: 640 issue: 4 year: 2018 ident: 3328_CR7 publication-title: Surgery doi: 10.1016/j.surg.2018.06.022 – volume: 68 start-page: 488 issue: 1 year: 2014 ident: 3328_CR50 publication-title: J Supercomput doi: 10.1007/s11227-013-1050-4 – ident: 3328_CR21 doi: 10.1007/978-981-13-0514-6_28 – volume: 75 start-page: 533 issue: 2 year: 2019 ident: 3328_CR40 publication-title: J Supercomput doi: 10.1007/s11227-017-2019-5 – volume: 19 start-page: 171 issue: 2 year: 2014 ident: 3328_CR1 publication-title: Mob Netw Appl doi: 10.1007/s11036-013-0489-0 – volume: 126 start-page: 465 year: 2019 ident: 3328_CR8 publication-title: Comput Chem Eng doi: 10.1016/j.compchemeng.2019.04.003 – ident: 3328_CR6 – volume: 28 start-page: 523 issue: 4 year: 2019 ident: 3328_CR22 publication-title: VLDB J doi: 10.1007/s00778-018-0534-5 – ident: 3328_CR26 doi: 10.1007/978-981-13-2354-6_16 – volume: 51 start-page: 107 issue: 1 year: 2008 ident: 3328_CR3 publication-title: Commun ACM doi: 10.1145/1327452.1327492 – ident: 3328_CR17 doi: 10.1007/978-3-319-13021-7_9 – ident: 3328_CR14 doi: 10.1109/I-SMAC.2017.8058263 – volume: 113 start-page: 8 issue: 1 year: 2015 ident: 3328_CR15 publication-title: Int J Comput Appl – ident: 3328_CR38 doi: 10.1007/978-981-15-0187-6_13 – volume: 18 start-page: 493 issue: 2 year: 2015 ident: 3328_CR4 publication-title: Clust Comput doi: 10.1007/s10586-015-0426-z – year: 2020 ident: 3328_CR24 publication-title: IEEE Trans Serv Comput doi: 10.1109/TSC.2019.2904270 – volume: 116 start-page: 89 issue: 1 year: 2018 ident: 3328_CR2 publication-title: Computr Model Eng Sci – volume-title: Machine learning in action year: 2012 ident: 3328_CR48 – volume: 13 start-page: 96 issue: 1 year: 2020 ident: 3328_CR53 publication-title: Brain Stimul doi: 10.1016/j.brs.2019.08.004 – volume: 19 start-page: 581 issue: 2 year: 2018 ident: 3328_CR10 publication-title: J Internet Technol – volume-title: Artificial intelligence: a modern approach year: 2016 ident: 3328_CR12 – volume: 21 start-page: 96 issue: 1 year: 2020 ident: 3328_CR29 publication-title: Brief Bioinform – volume: 29 start-page: e4109 issue: 20 year: 2017 ident: 3328_CR5 publication-title: Concurr Comput Pract Exp doi: 10.1002/cpe.4109 – volume: 15 start-page: e0229936 issue: 3 year: 2020 ident: 3328_CR30 publication-title: PLoS ONE doi: 10.1371/journal.pone.0229936 – volume: 349 start-page: 255 issue: 6245 year: 2015 ident: 3328_CR9 publication-title: Science doi: 10.1126/science.aaa8415 – volume-title: Monitoring with Ganglia: tracking dynamic host and application metrics at scale year: 2012 ident: 3328_CR46 – volume: 11 start-page: 1047 issue: 3 year: 2020 ident: 3328_CR31 publication-title: J Ambient Intell Humaniz Comput doi: 10.1007/s12652-018-1021-y – volume: 79 start-page: 32 year: 2019 ident: 3328_CR25 publication-title: Inf Syst doi: 10.1016/j.is.2017.11.006 – ident: 3328_CR37 doi: 10.1007/978-981-15-0029-9_28 – ident: 3328_CR16 – volume: 73 start-page: 2657 issue: 6 year: 2017 ident: 3328_CR51 publication-title: J Supercomput doi: 10.1007/s11227-016-1949-7 – volume: 24 start-page: 1058 issue: 4 year: 2020 ident: 3328_CR28 publication-title: Int J Psychosoc Rehabil – year: 2020 ident: 3328_CR41 publication-title: J Supercomput doi: 10.1007/s11227-020-03256-4 – year: 2020 ident: 3328_CR45 publication-title: J Supercomput doi: 10.1007/s11227-020-03190-5 – volume: 32 start-page: 93 issue: 1 year: 2020 ident: 3328_CR33 publication-title: Neural Comput Appl doi: 10.1007/s00521-018-3780-y – ident: 3328_CR34 doi: 10.5753/wperformance.2016.9723 – volume: 125 start-page: 133 year: 2017 ident: 3328_CR19 publication-title: J Syst Softw doi: 10.1016/j.jss.2016.11.037 – volume: 65 start-page: 6101 issue: 10 year: 2019 ident: 3328_CR23 publication-title: IEEE Trans Inf Theory doi: 10.1109/TIT.2019.2924621 – volume: 187 start-page: 104824 year: 2020 ident: 3328_CR36 publication-title: Knowl-Based Syst doi: 10.1016/j.knosys.2019.06.032 – volume: 237 start-page: 350 year: 2017 ident: 3328_CR11 publication-title: Neurocomputing doi: 10.1016/j.neucom.2017.01.026 – volume: 318 start-page: C787 year: 2020 ident: 3328_CR55 publication-title: Am J Physiol Cell Physiol doi: 10.1152/ajpcell.00192.2019 – year: 2020 ident: 3328_CR44 publication-title: J Supercomput doi: 10.1007/s11227-020-03150-z – year: 2020 ident: 3328_CR32 publication-title: IEEE Trans Serv Comput doi: 10.1109/TSC.2020.2966697 – volume: 72 start-page: 2235 issue: 6 year: 2016 ident: 3328_CR49 publication-title: J Supercomput doi: 10.1007/s11227-016-1727-6 – ident: 3328_CR13 doi: 10.1109/ICOA.2018.8370593 |
| SSID | ssj0004373 |
| Score | 2.485159 |
| Snippet | One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1273 |
| SubjectTerms | Algorithms Big Data Compilers Computer Science Data processing Interpreters Iterative algorithms Iterative methods K-nearest neighbors algorithm Machine learning Microprocessors Processor Architectures Programming Languages Run time (computers) |
| Title | Investigating the performance of Hadoop and Spark platforms on machine learning algorithms |
| URI | https://link.springer.com/article/10.1007/s11227-020-03328-5 https://www.proquest.com/docview/2480786797 |
| Volume | 77 |
| WOSCitedRecordID | wos000532628500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLink Journals customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA86PXhxfuJ0Sg7eNNCmaZseRRweZIgbY3gpaT7mcGtLW_37TbrUqqig57yG8F7y3kvzfr8HwLkkQSA910eRwxNEdMaNGBYcaXcpmRMqQepf2ZO7cDik02l0b0FhZVPt3jxJ1p66Bbu5GIfIXHccz8MU-etgQ4c7aho2PIwmLRrSW70rR1qS-gRbqMz3c3wOR22O-eVZtI42g-7_1rkDtm12Ca9W22EXrMl0D3Sbzg3QHuR98PiBXiOdQZ0EwrxFEMBMQe2RsiyHLBVwlLPiGeYLVhmBEmYpXNYlmBLanhMzyBazrJhXT8vyAIwHN-PrW2S7LCDuUb9CIkkiroSbeG7CeCC8gIdEEm1cXxiYreBMKJUEEmNOpFRMX9iYy5ihcaFceYegk2apPALQYUIa-jNtZ06UI2iix4VwRSBpiJOgB9xG1zG3DOSmEcYibrmTje5irbu41l3s98DF-zf5in_jV-l-Y8LYnsUyxgY1b3gFwx64bEzWDv882_HfxE_AFjYFL3VJdx90quJFnoJN_lrNy-Ks3qNvCTPhvw |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwED90CvritzidmgffNNCm6cceRRwT5xA3xvClpEk6h1tb2urfb9K1VkUFfc41hLvk7tLc73cAZ5I6jrRMG7cNHmCqMm7MiOBYuUvJDDcUtPiVPeq5_b43HrfvS1BYVlW7V0-ShaeuwW4mIS7W1x3DsoiH7WVYoSpiacb8h8GoRkNai3fltpL0bEpKqMz3c3wOR3WO-eVZtIg2nc3_rXMLNsrsEl0utsM2LMloBzarzg2oPMi78PiBXiOaIJUEoqRGEKA4RMojxXGCWCTQIGHpM0pmLNcCGYojNC9KMCUqe05MEJtN4nSaP82zPRh2rodXXVx2WcDc8uwciyBo81CYgWUGjDvCcrhLJVXGtYWG2QrORBgGjiSEUylDpi5szGRM07h4PLT2oRHFkTwAZDAhNf2ZsjOnoSG8QI0LYQpHei4JnCaYla59XjKQ60YYM7_mTta685Xu_EJ3vt2E8_dvkgX_xq_SrcqEfnkWM59o1LzmFXSbcFGZrB7-ebbDv4mfwlp3eNfzezf92yNYJ7r4pSjvbkEjT1_kMazy13yapSfFfn0Dpvrkow |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB60inixPrFadQ_edDHZbB49iloUSym0lOIlbPZRxZqENvr73U0TU0UF8byTZZnZx0xmvm8ATiX1POnYLm5ZPMJUe9yYEcGxvi4ls3wlaP4re9jxu91gNGr1FlD8ebV7mZKcYxoMS1OcXaRCXVTAN5sQH5vQx3IcEmB3GVaoKaQ38Xp_WCEjnXmOuaUlA5eSAjbz_Ryfn6bK3_ySIs1fnnb9_2vehI3C60SX822yBUsy3oZ62dEBFQd8Bx4WaDfiMdLOIUorZAFKFNI3VZKkiMUC9VM2fUbphGVGYIaSGL3kpZkSFb0oxohNxsn0KXt8me3CoH0zuLrFRfcFzJ3AzbCIohZXwo4cO2LcE47HfSqpNrorDPxWcCaUijxJCKdSKqYDOWYzZuhdAq6cPajFSSz3AVlMSEOLpu3PqbJEEOlxIWzhycAnkdcAu9R7yAtmctMgYxJWnMpGd6HWXZjrLnQbcPbxTTrn5fhVulmaMyzO6CwkBk1v-Ab9BpyX5quGf57t4G_iJ7DWu26Hnbvu_SGsE1MTk1d9N6GWTV_lEazyt-xpNj3Ot-47V6Xthw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Investigating+the+performance+of+Hadoop+and+Spark+platforms+on+machine+learning+algorithms&rft.jtitle=The+Journal+of+supercomputing&rft.au=Mostafaeipour%2C+Ali&rft.au=Jahangard+Rafsanjani%2C+Amir&rft.au=Ahmadi%2C+Mohammad&rft.au=Arockia+Dhanraj%2C+Joshuva&rft.date=2021-02-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=77&rft.issue=2&rft.spage=1273&rft.epage=1300&rft_id=info:doi/10.1007%2Fs11227-020-03328-5&rft.externalDocID=10_1007_s11227_020_03328_5 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon |