An overview and comparison of free Python libraries for data mining and big data analysis

The popularity of Python is growing, especially in the field of data science. Consequently, there is an increasing number of free libraries available for usage. The aim of this review paper is to describe and compare the characteristics of different data mining and big data analysis libraries in Pyt...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) S. 977 - 982
Hauptverfasser: Stancin, I., Jovic, A.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: Croatian Society MIPRO 01.05.2019
Schlagworte:
ISSN:2623-8764
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract The popularity of Python is growing, especially in the field of data science. Consequently, there is an increasing number of free libraries available for usage. The aim of this review paper is to describe and compare the characteristics of different data mining and big data analysis libraries in Python. There is currently no paper dealing with the subject and describing pros and cons of all these libraries. Here we consider more than 20 libraries and separate them into six groups: core libraries, data preparation, data visualization, machine learning, deep learning and big data. Beside functionalities of a certain library, important factors for comparison are the number of contributors developing and maintaining the library and the size of the community. Bigger communities mean larger chances for easily finding solution to a certain problem. We currently recommend: pandas for data preparation; Matplotlib, seaborn or Plotly for data visualization; scikit-learn for machine leraning; TensorFlow, Keras and PyTorch for deep learning; and Hadoop Streaming and PySpark for big data.
AbstractList The popularity of Python is growing, especially in the field of data science. Consequently, there is an increasing number of free libraries available for usage. The aim of this review paper is to describe and compare the characteristics of different data mining and big data analysis libraries in Python. There is currently no paper dealing with the subject and describing pros and cons of all these libraries. Here we consider more than 20 libraries and separate them into six groups: core libraries, data preparation, data visualization, machine learning, deep learning and big data. Beside functionalities of a certain library, important factors for comparison are the number of contributors developing and maintaining the library and the size of the community. Bigger communities mean larger chances for easily finding solution to a certain problem. We currently recommend: pandas for data preparation; Matplotlib, seaborn or Plotly for data visualization; scikit-learn for machine leraning; TensorFlow, Keras and PyTorch for deep learning; and Hadoop Streaming and PySpark for big data.
Author Jovic, A.
Stancin, I.
Author_xml – sequence: 1
  givenname: I.
  surname: Stancin
  fullname: Stancin, I.
  email: stancin.igor@gmail.com
  organization: Department of Electronics, University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, 10 000, Croatia
– sequence: 2
  givenname: A.
  surname: Jovic
  fullname: Jovic, A.
  email: alan.jovic@fer.hr
  organization: Department of Electronics, University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, 10 000, Croatia
BookMark eNotkE9rAjEUxNPSQtX6CbzkC6zNe8kmm6NI_wgWpbSHniTZfWsDa1YSsfjtK9XTML9h5jBDdhf7SIxNQExRWrBP74v1x2qKAuy0MqURVXXDhraUKKWwlbplA9Qoi8po9cDGOQcvFCojhNUD9j2LvD9SOgb65S42vO53e5dC7s-85W0i4uvT4edsu-DTOaHM2z7xxh0c34UY4va_58P2wlx03SmH_MjuW9dlGl91xL5enj_nb8Vy9bqYz5ZFQAWHwpNqNGgEZxx6pzwZUFgSgqybtianG2-xEa1WZWlB1TV4xKouFVgDBHLEJpfdQESbfQo7l06b6xPyD5lFVRI
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.23919/MIPRO.2019.8757088
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9532330984
9789532330984
EISSN 2623-8764
EndPage 982
ExternalDocumentID 8757088
Genre orig-research
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
M43
RIE
RIL
ID FETCH-LOGICAL-i241t-be4d61621a7a2ba4be71425e213cdfcea6db92d0f6455914cc1b228c541971e13
IEDL.DBID RIE
ISICitedReferencesCount 83
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000484544500175&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Mon Jul 08 05:38:50 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-be4d61621a7a2ba4be71425e213cdfcea6db92d0f6455914cc1b228c541971e13
PageCount 6
ParticipantIDs ieee_primary_8757088
PublicationCentury 2000
PublicationDate 2019-05-01
PublicationDateYYYYMMDD 2019-05-01
PublicationDate_xml – month: 05
  year: 2019
  text: 2019-05-01
  day: 01
PublicationDecade 2010
PublicationTitle 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)
PublicationTitleAbbrev MIPRO
PublicationYear 2019
Publisher Croatian Society MIPRO
Publisher_xml – name: Croatian Society MIPRO
SSID ssib042470096
Score 2.0370007
Snippet The popularity of Python is growing, especially in the field of data science. Consequently, there is an increasing number of free libraries available for...
SourceID ieee
SourceType Publisher
StartPage 977
SubjectTerms Big Data
big data analysis
data mining
Data science
Data visualization
framework
Libraries
Machine learning
machine learning library
Python
Regression tree analysis
Title An overview and comparison of free Python libraries for data mining and big data analysis
URI https://ieeexplore.ieee.org/document/8757088
WOSCitedRecordID wos000484544500175&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sGTSiu-ycGjazvZbLI5ilgUtBZRqKeSx6z04FZqK_jvTXbriuDFWwhMApMJM5N88w3AaVYE0_W5ToQRIUFBhYkVHhOpnM2E0jmnqlD4Vo1G-WSixy04a2phiKgCn9F5HFZ_-X7uVvGprB_J18PSbWgrJetarW_bEVyoGI7XxEI81aj7dzfjh_uI3grmUEv-aqFSeZDh1v_23obeTykeGzdOZgdaVHbh-aJkEXoZn_WZKT1zTTdBNi9YsaAg9BlZAViTDrMQnbKIB2WvVU-ISs7OXuo5s-Ym6cHT8Orx8jpZ90hIZsH3LhNLwkuUHI0y3BphSWG4hsQxdb5wZKS3mvtBIUXIHVA4h5bz3GUCtULCdBc65bykPWDOEw5cSKC0M0K63Ga8SEWIF63hUnm5D92olulbTYMxXWvk4O_pQ9iMmq-xgUfQWS5WdAwb7mM5e1-cVGf3BW9Mm3Y
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qFfSk0orf5uDRtU2aTTZHEUuLbS1SoZ5KPmalB7dSW8F_b7K7rghevIXA7EIyYWaSN-8BXMapd12XqIhr7gsUKmlkuKORkNbEXKqEYd4oPJCjUTKdqnENrqpeGETMwWd4HYb5W75b2HW4KmsF8nX_6Q3YDMpZZbfWt_dwxmVIyAtqIdZRVLWG_fHjQ8BveYcobH-JqOQxpLv7v7_vQfOnGY-MqzCzDzXMGvB8k5EAvgwX-0RnjthKT5AsUpIu0Rt9Bl4AUhXExOenJCBCyWuuCpHbmflLMadLdpImPHXvJre9qFRJiOY--q4ig9wJKhjVUjOjuUFJ_UFERjvWpRa1cEYx104F99UD5dZSw1hiY06VpEg7B1DPFhkeArEOadv6EkpZzYVNTMzSDvcZo9FMSCeOoBGWZfZWEGHMyhU5_nv6ArZ7k-FgNuiP7k9gJ-xCgRQ8hfpqucYz2LIfq_n78jzfxy-3T56_
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2019+42nd+International+Convention+on+Information+and+Communication+Technology%2C+Electronics+and+Microelectronics+%28MIPRO%29&rft.atitle=An+overview+and+comparison+of+free+Python+libraries+for+data+mining+and+big+data+analysis&rft.au=Stancin%2C+I.&rft.au=Jovic%2C+A.&rft.date=2019-05-01&rft.pub=Croatian+Society+MIPRO&rft.eissn=2623-8764&rft.spage=977&rft.epage=982&rft_id=info:doi/10.23919%2FMIPRO.2019.8757088&rft.externalDocID=8757088