Hands-On Data Preprocessing in Python Learn how to effectively prepare data for successful data analytics

Get your raw data cleaned up and ready for processing to design better data analytic solutions Key Features Develop the skills to perform data cleaning, data integration, data reduction, and data transformationMake the most of your raw data with powerful data transformation and massaging techniquesP...

Full description

Saved in:
Bibliographic Details
Main Author: Jafari, Roy
Format: eBook
Language:English
Published: Birmingham Packt Publishing 2022
Packt Publishing, Limited
Packt Publishing Limited
Edition:1
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Get your raw data cleaned up and ready for processing to design better data analytic solutions Key Features Develop the skills to perform data cleaning, data integration, data reduction, and data transformationMake the most of your raw data with powerful data transformation and massaging techniquesPerform thorough data cleaning, including dealing with missing values and outliers Book Description Hands-On Data Preprocessing is a primer on the best data cleaning and preprocessing techniques, written by an expert who’s developed college-level courses on data preprocessing and related subjects. With this book, you’ll be equipped with the optimum data preprocessing techniques from multiple perspectives, ensuring that you get the best possible insights from your data. You'll learn about different technical and analytical aspects of data preprocessing – data collection, data cleaning, data integration, data reduction, and data transformation – and get to grips with implementing them using the open source Python programming environment. The hands-on examples and easy-to-follow chapters will help you gain a comprehensive articulation of data preprocessing, its whys and hows, and identify opportunities where data analytics could lead to more effective decision making. As you progress through the chapters, you’ll also understand the role of data management systems and technologies for effective analytics and how to use APIs to pull data. By the end of this Python data preprocessing book, you'll be able to use Python to read, manipulate, and analyze data; perform data cleaning, integration, reduction, and transformation techniques, and handle outliers or missing values to effectively prepare data for analytic tools. What you will learn Use Python to perform analytics functions on your dataUnderstand the role of databases and how to effectively pull data from databasesPerform data preprocessing steps defined by your analytics goalsRecognize and resolve data integration challengesIdentify the need for data reduction and execute itDetect opportunities to improve analytics with data transformation Who this book is for This book is for junior and senior data analysts, business intelligence professionals, engineering undergraduates, and data enthusiasts looking to perform preprocessing and data cleaning on large amounts of data. You don’t need any prior experience with data preprocessing to get started with this book. However, basic programming skills, such as working with variables, conditionals, and loops, along with beginner-level knowledge of Python and simple analytics experience, are a prerequisite.
AbstractList This book will make the link between data cleaning and preprocessing to help you to take effective business decisions using data analyticsKey FeaturesBecome well-versed with the core concepts of data cleaning, data fusion, data reduction, and data integrationGet ready to make the most of your data with powerful data transformation and massaging techniquesLearn how to apply Multi-Layered Perceptron (MLP) to clean and create issue-free dataBook DescriptionData preprocessing is the first step in data visualization, data analytics, and machine learning, where data is prepared for analytics functions to get the best possible insights. Around 90% of the time spent on data analytics, data visualization, and machine learning projects is dedicated to performing data preprocessing.This book will equip you with optimum data preprocessing techniques from multiple perspectives. You'll learn different technical and analytical aspects of data preprocessing - data collection, data cleaning, data integration, data reduction, and data transformation - and get to grips with implementing them using the open-source Python programming environment. The book will provide a comprehensive articulation of data preprocessing, its whys and hows, and help you identify analytics opportunities where data analytics could lead to more effective decision making. It also demonstrates the role of data management systems and technologies for effective analytics and how to create queries to pull data from relational databases.By the end of this Python data preprocessing book, you'll be able to use Python to read, manipulate, and analyze data, perform data cleaning, integration, reduction techniques, and handle outliers or missing values to implement the appropriate data transformation method.What you will learnUse Python to perform analytics functions on your dataLearn the role of databases and connect to them effectively for your analytics requirementsPerform data cleaning and preprocessing defined by your analytics goalsUnderstand and resolve the challenges faced while performing data integrationDiscover different data reduction methods and learn how to execute them effectivelyExplore a variety of data transformation methods and choose the most suitable method for your use caseWho This Book Is ForJunior and senior data analysts, business intelligence professionals, engineering undergraduates, and data enthusiasts looking to perform pre-processing and data cleaning on large amounts of data will find this book useful. Basic programming skills such as working with variables, conditionals, and loops, along with beginner-level knowledge of Python and simple analytics experience is assumed.Table of ContentsReview of the Core Modules NumPy and PandasReview of Another Core Module: MatplotlibData - What Is It Really?DatabasesData VisualizationPredictionClassificationClustering AnalysisData Cleaning Level I-Clean Up the TableData Cleaning Level II-Unpack, Restructure, and Reformulate the TableData Cleaning Level III-Missing Values, Outliers, and ErrorsData Fusion and IntegrationData ReductionData Massaging and TransformationCase Study 1: Mental Health in TechCase Study 2: Predict COVID HospitalizationCase Study 3: United States Counties Clustering AnalysisPractice Cases
Get your raw data cleaned up and ready for processing to design better data analytic solutions Key Features Develop the skills to perform data cleaning, data integration, data reduction, and data transformationMake the most of your raw data with powerful data transformation and massaging techniquesPerform thorough data cleaning, including dealing with missing values and outliers Book Description Hands-On Data Preprocessing is a primer on the best data cleaning and preprocessing techniques, written by an expert who’s developed college-level courses on data preprocessing and related subjects. With this book, you’ll be equipped with the optimum data preprocessing techniques from multiple perspectives, ensuring that you get the best possible insights from your data. You'll learn about different technical and analytical aspects of data preprocessing – data collection, data cleaning, data integration, data reduction, and data transformation – and get to grips with implementing them using the open source Python programming environment. The hands-on examples and easy-to-follow chapters will help you gain a comprehensive articulation of data preprocessing, its whys and hows, and identify opportunities where data analytics could lead to more effective decision making. As you progress through the chapters, you’ll also understand the role of data management systems and technologies for effective analytics and how to use APIs to pull data. By the end of this Python data preprocessing book, you'll be able to use Python to read, manipulate, and analyze data; perform data cleaning, integration, reduction, and transformation techniques, and handle outliers or missing values to effectively prepare data for analytic tools. What you will learn Use Python to perform analytics functions on your dataUnderstand the role of databases and how to effectively pull data from databasesPerform data preprocessing steps defined by your analytics goalsRecognize and resolve data integration challengesIdentify the need for data reduction and execute itDetect opportunities to improve analytics with data transformation Who this book is for This book is for junior and senior data analysts, business intelligence professionals, engineering undergraduates, and data enthusiasts looking to perform preprocessing and data cleaning on large amounts of data. You don’t need any prior experience with data preprocessing to get started with this book. However, basic programming skills, such as working with variables, conditionals, and loops, along with beginner-level knowledge of Python and simple analytics experience, are a prerequisite.
Author Jafari, Roy
Author_xml – sequence: 1
  fullname: Jafari, Roy
BookMark eNpljzFPwzAQhY0ECFo6sjBlQYghYMd2bI80FIpUqR0Qa-Q45zY0OBCnlP57LIJAiFvuTvru3XsDtO8aBwidEnyFQ10rIYnEBAuleLKHBj8LOQxLohLKJUnwERp5_xwOKOGUEnmMzqfalT6eu-hWdzpatPDaNga8r9wyqly02HWrxp2gA6trD6PvPkRPd5PHbBrP5vcP2c0s1kRx9hEbDBxzU6Scc8qthYIwBlpaaZlQTGFdJsLgEjNLgicrS5tKw4rSMFFQrekQXfbC2q9h61dN3fn8vYaiadY-_5Pyl93quoO2hGW72YUhf9Gt-cde9GxI97YB3-VfkgZc1-o6n4yzVDIugushOuvJCgDy_i_BJE2ESukn9BloQg
ContentType eBook
Copyright 2022 Packt Publishing
Copyright_xml – notice: 2022 Packt Publishing
DEWEY 005.133
DOI 10.0000/9781801079952
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1801079951
9781801079952
Edition 1
ExternalDocumentID 9781801079952
EBC6845755
10162796
GroupedDBID -VQ
-VX
38.
5O.
AABBV
AAFKH
AAKGN
AANYM
AAXUV
AAZEP
AAZGR
ABIWA
ABMRC
ABRSK
ABWNX
ACBYE
ACIWJ
ACMFT
ACNAM
ACXXF
ADBND
AECLD
AEDWI
AEHEP
AEIUR
AEMZR
AETWE
AFQEX
AHWGJ
ALMA_UNASSIGNED_HOLDINGS
APVFW
ATDNW
BBABE
BPBUR
BSWCA
CMZ
CZZ
DUGUG
E2F
EBSCA
ECNEQ
ECOWB
IIUVB
K-E
L7C
NEJRU
OHILO
OODEK
PASLL
QD8
TD3
UE6
XI1
O7H
ID FETCH-LOGICAL-a1954x-c0e505cb655535ffeb144ea8f8f479490ad27c0d04f1923f8df68c4bdc47b3aa3
IngestDate Fri Feb 28 06:54:56 EST 2025
Fri Nov 21 19:13:26 EST 2025
Wed Nov 12 00:40:37 EST 2025
Sun Jun 29 07:32:05 EDT 2025
IsPeerReviewed false
IsScholarly false
LCCallNum_Ident QA76.73.P98 J343 2022
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a1954x-c0e505cb655535ffeb144ea8f8f479490ad27c0d04f1923f8df68c4bdc47b3aa3
OCLC 1292358120
PQID EBC6845755
PageCount 602
ParticipantIDs askewsholts_vlebooks_9781801079952
walterdegruyter_marc_9781801079952
proquest_ebookcentral_EBC6845755
ieee_books_10162796
PublicationCentury 2000
PublicationDate 2022
[2022]
2022-01-21
PublicationDateYYYYMMDD 2022-01-01
2022-01-21
PublicationDate_xml – year: 2022
  text: 2022
PublicationDecade 2020
PublicationPlace Birmingham
PublicationPlace_xml – name: Birmingham
– name: Birmingham, UK
PublicationYear 2022
Publisher Packt Publishing
Packt Publishing, Limited
Packt Publishing Limited
Publisher_xml – name: Packt Publishing
– name: Packt Publishing, Limited
– name: Packt Publishing Limited
RestrictionsOnAccess restricted access
SSID ssj0003153318
Score 2.2626698
Snippet Get your raw data cleaned up and ready for processing to design better data analytic solutions Key Features Develop the skills to perform data cleaning, data...
This book will make the link between data cleaning and preprocessing to help you to take effective business decisions using data analyticsKey FeaturesBecome...
SourceID askewsholts
walterdegruyter
proquest
ieee
SourceType Aggregation Database
Publisher
SubjectTerms COM018000 COMPUTERS / Data Processing
COMPUTERS / Data Visualization
COMPUTERS / Database Management / Data Warehousing
Computing and Processing
Electronic data processing
Python (Computer program language)
Subtitle Learn how to effectively prepare data for successful data analytics
TableOfContents Table of Contents Review of the Core Modules of NumPy and PandasReview of Another Core Module - MatplotlibData – What Is It Really?DatabasesData VisualizationPredictionClassificationClustering AnalysisData Cleaning Level I - Cleaning Up the TableData Cleaning Level II - Unpacking, Restructuring, and Reformulating the TableData Cleaning Level III- Missing Values, Outliers, and ErrorsData Fusion and Data IntegrationData ReductionData Transformation and MassagingCase Study 1 - Mental Health in TechCase Study 2 - Predicting COVID-19 HospitalizationsCase Study 3: United States Counties Clustering AnalysisSummary, Practice Case Studies, and Conclusions
Cover -- Copyright -- Contributors -- Table of Contents -- Preface -- Part 1: Technical Needs -- Chapter 1: Review of the Core Modules of NumPy and Pandas -- Technical requirements -- Overview of the Jupyter Notebook -- Are we analyzing data via computer programming? -- Overview of the basic functions of NumPy -- The np.arange() function -- The np.zeros() and np.ones() functions -- The np.linspace() function -- Overview of Pandas -- Pandas data access -- Boolean masking for filtering a DataFrame -- Pandas functions for exploring a DataFrame -- Pandas applying a function -- The Pandas groupby function -- Pandas multi-level indexing -- Pandas pivot and melt functions -- Summary -- Exercises -- Chapter 2: Review of Another Core Module - Matplotlib -- Technical requirements -- Drawing the main plots in Matplotlib -- Summarizing numerical attributes using histograms or boxplots -- Observing trends in the data using a line plot -- Relating two numerical attributes using a scatterplot -- Modifying the visuals -- Adding a title to visuals and labels to the axis -- Adding legends -- Modifying ticks -- Modifying markers -- Subplots -- Resizing visuals and saving them -- Resizing -- Saving -- Example of Matplotilb assisting data preprocessing -- Summary -- Exercises -- Chapter 3: Data - What Is It Really? -- Technical requirements -- What is data? -- Why this definition? -- DIKW pyramid -- Data preprocessing for data analytics versus data preprocessing for machine learning -- The most universal data structure - a table -- Data objects -- Data attributes -- Types of data values -- Analytics standpoint -- Programming standpoint -- Information versus pattern -- Understanding everyday use of the word "information -- Statistical use of the word "information -- Statistical meaning of the word "pattern -- Summary -- Exercises -- References
Chapter 14: Data Transformation and Massaging -- Technical requirements -- The whys of data transformation and massaging -- Data transformation versus data massaging -- Normalization and standardization -- Binary coding, ranking transformation, and discretization -- Example one - binary coding of nominal attribute -- Example two - binary coding or ranking transformation of ordinal attributes -- Example three - discretization of numerical attributes -- Understanding the types of discretization -- Discretization - the number of cut-off points -- A summary - from numbers to categories and back -- Attribute construction -- Example - construct one transformed attribute from two attributes -- Feature extraction -- Example - extract three attributes from one attribute -- Example - Morphological feature extraction -- Feature extraction examples from the previous chapters -- Log transformation -- Implementation - doing it yourself -- Implementation - the working module doing it for you -- Smoothing, aggregation, and binning -- Smoothing -- Aggregation -- Binning -- Summary -- Exercise -- Part 4: Case Studies -- Chapter 15: Case Study 1 - Mental Health in Tech -- Technical requirements -- Introducing the case study -- The audience of the results of analytics -- Introduction to the source of the data -- Integrating the data sources -- Cleaning the data -- Detecting and dealing with outliers and errors -- Detecting and dealing with missing values -- Analyzing the data -- Analysis question one - is there a significant difference between the mental health of employees across the attribute of gender? -- Analysis question two - is there a significant difference between the mental health of employees across the Age attribute? -- Analysis question three - do more supportive companies have mentally healthier employees?
Example of detecting missing values -- Causes of missing values -- Types of missing values -- Diagnosis of missing values -- Dealing with missing values -- Outliers -- Detecting outliers -- Dealing with outliers -- Errors -- Types of errors -- Dealing with errors -- Detecting systematic errors -- Summary -- Exercises -- Chapter 11: Data Fusion and Data Integration -- Technical requirements -- What are data fusion and data integration? -- Data fusion versus data integration -- Directions of data integration -- Frequent challenges regarding data fusion and integration -- Challenge 1 - entity identification -- Challenge 2 - unwise data collection -- Challenge 3 - index mismatched formatting -- Challenge 4 - aggregation mismatch -- Challenge 5 - duplicate data objects -- Challenge 6 - data redundancy -- Example 1 (challenges 3 and 4) -- Example 2 (challenges 2 and 3) -- Example 3 (challenges 1, 3, 5, and 6) -- Checking for duplicate data objects -- Designing the structure for the result of data integration -- Filling songIntegrate_df from billboard_df -- Filling songIntegrate_df from songAttribute_df -- Filling songIntegrate_df from artist_df -- Checking for data redundancy -- The analysis -- Example summary -- Summary -- Exercise -- Chapter 13: Data Reduction -- Technical requirements -- The distinction between data reduction and data redundancy -- The objectives of data reduction -- Types of data reduction -- Performing numerosity data reduction -- Random sampling -- Stratified sampling -- Random over/undersampling -- Performing dimensionality data reduction -- Linear regression as a dimension reduction method -- Using a decision tree as a dimension reduction method -- Using random forest as a dimension reduction method -- Brute-force computational dimension reduction -- PCA -- Functional data analysis -- Summary -- Exercises
Chapter 4: Databases -- Technical requirements -- What is a database? -- Understanding the difference between a database and a dataset -- Types of databases -- The differentiating elements of databases -- Relational databases (SQL databases) -- Unstructured databases (NoSQL databases) -- A practical example that requires a combination of both structured and unstructured databases -- Distributed databases -- Blockchain -- Connecting to, and pulling data from, databases -- Direct connection -- Web page connection -- API connection -- Request connection -- Publicly shared -- Summary -- Exercises -- Part 2: Analytic Goals -- Chapter 5: Data Visualization -- Technical requirements -- Summarizing a population -- Example of summarizing numerical attributes -- Example of summarizing categorical attributes -- Comparing populations -- Example of comparing populations using boxplots -- Example of comparing populations using histograms -- Example of comparing populations using bar charts -- Investigating the relationship between two attributes -- Visualizing the relationship between two numerical attributes -- Visualizing the relationship between two categorical attributes -- Visualizing the relationship between a numerical attribute and a categorical attribute -- Adding visual dimensions -- Example of a five-dimensional scatter plot -- Showing and comparing trends -- Example of visualizing and comparing trends -- Summary -- Exercise -- Chapter 6: Prediction -- Technical requirements -- Predictive models -- Forecasting -- Regression analysis -- Linear regression -- Example of applying linear regression to perform regression analysis -- MLP -- How does MLP work? -- Example of applying MLP to perform regression analysis -- Summary -- Exercises -- Chapter 7: Classification -- Technical requirements -- Classification models
Analysis question four - does the attitude of individuals toward mental health influence their mental health and their seeking of treatments?
Example of designing a classification model -- Classification algorithms -- KNN -- Example of using KNN for classification -- Decision Trees -- Example of using Decision Trees for classification -- Summary -- Exercises -- Chapter 8: Clustering Analysis -- Technical requirements -- Clustering model -- Clustering example using a two-dimensional dataset -- Clustering example using a three-dimensional dataset -- K-Means algorithm -- Using K-Means to cluster a two-dimensional dataset -- Using K-Means to cluster a dataset with more than two dimensions -- Centroid analysis -- Summary -- Exercises -- Part 3: The Preprocessing -- Chapter 9: Data Cleaning Level I - Cleaning Up the Table -- Technical requirements -- The levels, tools, and purposes of data cleaning - a roadmap to chapters 9, 10, and 11 -- Purpose of data analytics -- Tools for data analytics -- Levels of data cleaning -- Mapping the purposes and tools of analytics to the levels of data cleaning -- Data cleaning level I - cleaning up the table -- Example 1 - unwise data collection -- Example 2 - reindexing (multi-level indexing) -- Example 3 - intuitive but long column titles -- Summary -- Exercises -- Chapter 10: Data Cleaning Level II - Unpacking, Restructuring, and Reformulating the Table -- Technical requirements -- Example 1 - unpacking columns and reformulating the table -- Unpacking FileName -- Unpacking Content -- Reformulating a new table for visualization -- The last step - drawing the visualization -- Example 2 - restructuring the table -- Example 3 - level I and II data cleaning -- Level I cleaning -- Level II cleaning -- Doing the analytics - using linear regression to create a predictive model -- Summary -- Exercises -- Chapter 11: Data Cleaning Level III - Missing Values, Outliers, and Errors -- Technical requirements -- Missing values -- Detecting missing values
Hands-On Data Preprocessing in Python: Learn how to effectively prepare data for successful data analytics
Title Hands-On Data Preprocessing in Python
URI https://ieeexplore.ieee.org/servlet/opac?bknumber=10162796
https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=6845755
https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781801079952
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NT8IwFH9R8CAHP1AjfmUxelwcW9m6qwTjRSSGGOJlKV1rCGYQNhD-e1_3JSUe9OBlWbu2y35vfe-1fR8AN9TjlLtEmoIJaRIHpyL1bGE6ztBmIarE1GNpsgmv26WDgd_LE2jFaToBL4roculP_5XUWIfEVq6zfyB3OShW4D0SHa9IdrxuaMRlsUjzE4Wx-RwhJROmbCummRNA7rXSW6kwAaXFDJMsczJ_mazWl_62vbH07zE-Tta3q3SfqGyJ2EQRZKmob_ZPDFOJq8xGQm-nB6beEBilGZ_WbRuqNmk5yqyy_fRW7nM5SqFs0iy8qXrdndarBjUWj5GfI69P4jzFjabt732mdgOheJ_NV0lxTp2K__4BVIXyCTmELRHVYb_IhGHkjPEIbgvsDYW9oWFvjCIjw_4YXh86_fajmSegMJkKhLc0uSVQQ-RDt9XCL5MSBRshglFJpYrM71sstD1uhRaRSlOWNJQu5WQYcuINHcacE6hEk0icgoETgLjcZw6qgIRyy7ekEFhwBXUlstEGXK8BESw-0sPyONDQakBd4RNkj9TGi-35bgOMAq4g7ZXb9Qad-7ZLCSrkLRx9A8ZAhUrRRz_7TaNz2P3-FS-gkszm4hJ2-CIZxbOrlPZfY4o13w
linkProvider Knovel
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Hands-On+Data+Preprocessing+in+Python&rft.au=Jafari%2C+Roy&rft.date=2022-01-01&rft.pub=Packt+Publishing+Limited&rft.isbn=9781801079952&rft_id=info:doi/10.0000%2F9781801079952&rft.externalDBID=n%2Fa&rft.externalDocID=9781801079952
thumbnail_m http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97818010%2F9781801079952.jpg