Python Data Analyst's Toolkit - Learn Python and Python-Based Libraries with Applications in Data Analysis and Statistics
Explore the fundamentals of data analysis, and statistics with case studies using Python. This book will show you how to confidently write code in Python, and use various Python libraries and functions for analyzing any dataset. The code is presented in Jupyter notebooks that can further be adapted...
Uloženo v:
| Hlavní autor: | |
|---|---|
| Médium: | E-kniha Kniha |
| Jazyk: | angličtina |
| Vydáno: |
Berkeley, CA
Apress, an imprint of Springer Nature
2021
Apress Apress L. P |
| Vydání: | 1 |
| Témata: | |
| ISBN: | 9781484263983, 1484263987, 9781484263990, 1484263995 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
Obsah:
- Title Page Introduction Table of Contents 1. Getting Familiar with Python 2. Exploring Containers, Classes, and Objects 3. Regular Expressions and Math with Python 4. Descriptive Data Analysis Basics 5. Working with NumPy Arrays 6. Prepping Your Data with Pandas 7. Data Visualization with Python Libraries 8. Data Analysis Case Studies 9. Statistics and Probability with Python Index
- Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Chapter 1: Getting Familiar with Python -- Technical requirements -- Getting started with Jupyter notebooks -- Shortcuts and other features in Jupyter -- Tab Completion -- Magic commands used in Jupyter -- Python Basics -- Comments, print, and input -- Comments -- Printing -- Input -- Variables and Constants -- Operators -- Assignment operators -- Data types -- Working with Strings -- Conditional statements -- Loops -- While loop -- for loop -- Functions -- Syntax errors and exceptions -- Working with files -- Reading from a file -- Writing to a file -- Modules in Python -- Python Enhancement Proposal (PEP) 8 - standards for writing code -- Summary -- Review Exercises -- Chapter 2: Exploring Containers, Classes, and Objects -- Containers -- Lists -- Creating new lists from existing lists -- Accessing the index of items in a list -- Concatenating of lists -- Tuples -- Methods used with a tuple -- Applications of tuples -- Dictionaries -- Sets -- Object-oriented programming -- Object-oriented programming principles -- Summary -- Review Exercises -- Chapter 3: Regular Expressions and Math with Python -- Regular expressions -- Steps for solving problems with regular expressions -- Python functions for regular expressions -- Metacharacters -- Using Sympy for math problems -- Factorization of an algebraic expression -- Solving algebraic equations (for one variable) -- Solving simultaneous equations (for two variables) -- Solving expressions entered by the user -- Solving simultaneous equations graphically -- Creating and manipulating sets -- Union and intersection of sets -- Finding the probability of an event -- Solving questions in calculus -- Limit of a function -- Derivative of a function -- Integral of a function -- Summary
- Indexers and selection of subsets of data -- Understanding loc and iloc indexers -- Selecting consecutive rows -- Selecting consecutive columns -- Selecting a single row -- Selecting rows using their index labels -- Selecting columns using their name -- Using negative index values for selection -- Selecting nonconsecutive rows and columns -- Other (less commonly used) indexers for data access -- ix indexer -- The indexing operator - [ ] -- at and iat indexers -- Boolean indexing for selecting subsets of data -- Using the query method to retrieve data -- Further reading -- Operators in Pandas -- Representing dates and times in Pandas -- Converting strings into Pandas Timestamp objects -- Extracting the components of a Timestamp object -- Further reading -- Grouping and aggregation -- Examining the properties of the groupby object -- Data type of groupby object -- Obtaining the names of the groups -- Returning records with the same position in each group using the nth method -- Get all the data for a particular group using the get_group method -- Filtering groups -- Transform method and groupby -- Apply method and groupby -- How to combine objects in Pandas -- Append method for adding rows -- Understanding the various types of joins -- Concat function (adding rows or columns from other objects) -- Join method - index to index -- Merge method - SQL type join based on common columns -- Restructuring data and dealing with anomalies -- Dealing with missing data -- Dropping the missing data -- Imputation -- Data duplication -- Tidy data and techniques for restructuring data -- Conversion from wide to long format (tidy data) -- Stack method (wide-to-long format conversion) -- Melt method (wide-to-long format conversion) -- Pivot method (long-to-wide conversion) -- Summary -- Review Exercises -- Chapter 7: Data Visualization with Python Libraries
- Review Exercises -- Chapter 4: Descriptive Data Analysis Basics -- Descriptive data analysis - Steps -- Structure of data -- Classifying data into different levels -- Visualizing various levels of data -- Plotting mixed data -- Summary -- Review Exercises -- Chapter 5: Working with NumPy Arrays -- Getting familiar with arrays and NumPy functions -- Creating an array -- Reshaping an array -- Combining arrays -- Testing for conditions -- Broadcasting, vectorization, and arithmetic operations -- Obtaining the properties of an array -- Slicing or selecting a subset of data -- Obtaining descriptive statistics/aggregate measures -- Matrices -- Summary -- Review Exercises -- Chapter 6: Prepping Your Data with Pandas -- Pandas at a glance -- Technical requirements -- Building blocks of Pandas -- Examining the properties of a Series -- DataFrames -- Creating DataFrames by importing data from other formats -- From a CSV file: -- From an Excel file: -- From a JSON file: -- From an HTML file: -- Accessing attributes in a DataFrame -- Accessing the values in the DataFrame -- Modifying DataFrame objects -- Renaming columns -- Replacing values or observations in a DataFrame -- Adding a new column to a DataFrame -- Inserting rows in a DataFrame -- Deleting columns from a DataFrame -- Deleting a row from a DataFrame -- Indexing -- Type of an index object -- Creating a custom index and using columns as indexes -- Indexes and speed of data retrieval -- Searching without using an index -- Search using an index -- Immutability of an index -- Alignment of indexes -- Set operations on indexes -- Union operation -- Difference operation -- Symmetric difference operation -- Data types in Pandas -- Obtaining information about data types -- Get the count of each data type -- Select particular data types -- Calculating the memory usage and changing data types of columns
- Technical requirements -- External files -- Commonly used plots -- Matplotlib -- Approach for plotting using Matplotlib -- Plotting using Pandas -- Scatter plot -- Histogram -- Pie charts -- Seaborn library -- Box plots -- Adding arguments to any Seaborn plotting function -- Kernel density estimate -- Violin plot -- Count plots -- Heatmap -- Facet grid -- Regplot -- lmplot -- Strip plot -- Swarm plot -- Catplot -- Pair plot -- Joint plot -- Summary -- Review Exercises -- Chapter 8: Data Analysis Case Studies -- Technical requirements -- Methodology -- Case study 8-1: Highest grossing movies in France - analyzing unstructured data -- Case study 8-2: Use of data analysis for air quality management -- Case study 8-3: Worldwide COVID-19 cases - an analysis -- Summary -- Review Exercises -- Chapter 9: Statistics and Probability with Python -- Permutations and combinations -- Probability -- Rules of probability -- Conditional probability -- Bayes theorem -- Application of Bayes theorem in medical diagnostics -- Another application of Bayes theorem: Email spam classification -- SciPy library -- Probability distributions -- Binomial distribution -- The shape of a binomial distribution -- Poisson distribution -- The shape of a Poisson distribution -- Continuous probability distributions -- Normal distribution -- Standard normal distribution -- Solved examples: Standard normal distribution -- Measures of central tendency -- Measures of dispersion -- Measures of shape -- Sampling -- Probability sampling -- Non-probability sampling -- Central limit theorem -- Estimates and confidence intervals -- Types of errors in sampling -- Hypothesis testing -- Basic concepts in hypothesis testing -- Key terminology used in hypothesis testing -- Steps involved in hypothesis testing -- One-sample z-test -- Two-sample sample z-test -- Hypothesis tests with proportions
- Two-sample z-test for the population proportions -- T-distribution -- One sample t-test -- Two-sample t-test -- Two-sample t-test for paired samples -- Solved examples: Conducting t-tests using Scipy functions -- ANOVA -- Chi-square test of association -- Summary -- Review Exercises -- Bibliography -- Index

