Practical machine learning in R

Guides professionals and students through the rapidly growing field of machine learning with hands-on examples in the popular R programming language Machine learning—a branch of Artificial Intelligence (AI) which enables computers to improve their results and learn new approaches without explicit in...

Full description

Saved in:

Bibliographic Details
Main Author:	Fred Nwanganga, Mike Chapple
Format:	eBook
Language:	English
Published:	Newark Wiley 2020 John Wiley & Sons, Incorporated
Edition:	1
Subjects:	Machine learning
ISBN:	9781119591511, 1119591511, 9781119591535, 1119591538
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Dealing with Categorical Features -- Splitting the Data -- Classifying Unlabeled Data -- Evaluating the Model -- Improving the Model -- Strengths and Weaknesses -- Case Study: Revisiting the Donor Dataset -- Importing the Data -- Exploring and Preparing the Data -- Dealing with Missing Data -- Normalizing the Data -- Splitting and Balancing the Data -- Building the Model -- Evaluating the Model -- Exercises -- Chapter 7 Naïve Bayes -- Classifying Spam Email -- Naïve Bayes -- Probability -- Joint Probability -- Conditional Probability -- Classification with Naïve Bayes -- Additive Smoothing -- Naïve Bayes Model -- Evaluating the Model -- Strengths and Weaknesses of the Naïve Bayes Classifier -- Case Study: Revisiting the Heart Disease Detection Problem -- Importing the Data -- Exploring and Preparing the Data -- Building the Model -- Evaluating the Model -- Exercises -- Chapter 8 Decision Trees -- Predicting Build Permit Decisions -- Decision Trees -- Recursive Partitioning -- Entropy -- Information Gain -- Gini Impurity -- Pruning -- Building a Classification Tree Model -- Splitting the Data -- Training a Model -- Evaluating the Model -- Strengths and Weaknesses of the Decision Tree Model -- Case Study: Revisiting the Income Prediction Problem -- Importing the Data -- Exploring and Preparing the Data -- Building the Model -- Evaluating the Model -- Exercises -- Part IV Evaluating and Improving Performance -- Chapter 9 Evaluating Performance -- Estimating Future Performance -- Cross-Validation -- k-Fold Cross-Validation -- Leave-One-Out Cross-Validation -- Random Cross-Validation -- Bootstrap Sampling -- Beyond Predictive Accuracy -- Kappa -- Precision and Recall -- Sensitivity and Specificity -- Visualizing Model Performance -- Receiver Operating Characteristic Curve -- Area Under the Curve -- Exercises -- Chapter 10 Improving Performance
Parameter Tuning -- Automated Parameter Tuning -- Customized Parameter Tuning -- Ensemble Methods -- Bagging -- Boosting -- Stacking -- Exercises -- Part V Unsupervised Learning -- Chapter 11 Discovering Patterns with Association Rules -- Market Basket Analysis -- Association Rules -- Identifying Strong Rules -- Support -- Confidence -- Lift -- The Apriori Algorithm -- Discovering Association Rules -- Generating the Rules -- Evaluating the Rules -- Strengths and Weaknesses -- Case Study: Identifying Grocery Purchase Patterns -- Importing the Data -- Exploring and Preparing the Data -- Generating the Rules -- Evaluating the Rules -- Exercises -- Notes -- Chapter 12 Grouping Data with Clustering -- Clustering -- k-Means Clustering -- Segmenting Colleges with -Means Clustering -- Creating the Clusters -- Analyzing the Clusters -- Choosing the Right Number of Clusters -- The Elbow Method -- The Average Silhouette Method -- The Gap Statistic -- Strengths and Weaknesses of k-Means Clustering -- Case Study: Segmenting Shopping Mall Customers -- Exploring and Preparing the Data -- Clustering the Data -- Evaluating the Clusters -- Exercises -- Note -- Index -- EULA
Reducing the Data -- Sampling -- Dimensionality Reduction -- Exercises -- Part II Regression -- Chapter 4 Linear Regression -- Bicycle Rentals and Regression -- Relationships Between Variables -- Correlation -- Regression -- Simple Linear Regression -- Ordinary Least Squares Method -- Simple Linear Regression Model -- Evaluating the Model -- Residuals -- Coefficients -- Diagnostics -- Multiple Linear Regression -- The Multiple Linear Regression Model -- Evaluating the Model -- Residual Diagnostics -- Influential Point Analysis -- Multicollinearity -- Improving the Model -- Considering Nonlinear Relationships -- Considering Categorical Variables -- Considering Interactions Between Variables -- Selecting the Important Variables -- Strengths and Weaknesses -- Case Study: Predicting Blood Pressure -- Importing the Data -- Exploring the Data -- Fitting the Simple Linear Regression Model -- Fitting the Multiple Linear Regression Model -- Exercises -- Chapter 5 Logistic Regression -- Prospecting for Potential Donors -- Classification -- Logistic Regression -- Odds Ratio -- Binomial Logistic Regression Model -- Dealing with Missing Data -- Dealing with Outliers -- Splitting the Data -- Dealing with Class Imbalance -- Training a Model -- Evaluating the Model -- Coefficients -- Diagnostics -- Predictive Accuracy -- Improving the Model -- Dealing with Multicollinearity -- Choosing a Cutoff Value -- Strengths and Weaknesses -- Case Study: Income Prediction -- Importing the Data -- Exploring and Preparing the Data -- Training the Model -- Evaluating the Model -- Exercises -- Part III Classification -- Chapter 6 k-Nearest Neighbors -- Detecting Heart Disease -- k-Nearest Neighbors -- Finding the Nearest Neighbors -- Labeling Unlabeled Data -- Choosing an Appropriate k -- k-Nearest Neighbors Model -- Dealing with Missing Data -- Normalizing the Data
Cover -- Title Page -- Copyright Page -- About the Authors -- About the Technical Editors -- Acknowledgments -- Contents at a Glance -- Contents -- Introduction -- What Does This Book Cover? -- Reader Support for This Book -- Part I Getting Started -- Chapter 1 What Is Machine Learning? -- Discovering Knowledge in Data -- Introducing Algorithms -- Artificial Intelligence, Machine Learning, and Deep Learning -- Machine Learning Techniques -- Supervised Learning -- Unsupervised Learning -- Model Selection -- Classification Techniques -- Regression Techniques -- Similarity Learning Techniques -- Model Evaluation -- Classification Errors -- Regression Errors -- Types of Error -- Partitioning Datasets -- Holdout Method -- Cross-Validation Methods -- Exercises -- Chapter 2 Introduction to R and RStudio -- Welcome to R -- R and RStudio Components -- The R Language -- RStudio -- RStudio Desktop -- RStudio Server -- Exploring the RStudio Environment -- R Packages -- The CRAN Repository -- Installing Packages -- Loading Packages -- Package Documentation -- Writing and Running an R Script -- Data Types in R -- Vectors -- Testing Data Types -- Converting Data Types -- Missing Values -- Exercises -- Chapter 3 Managing Data -- The Tidyverse -- Data Collection -- Key Considerations -- Collecting Ground Truth Data -- Data Relevance -- Quantity of Data -- Ethics -- Importing the Data -- Reading Comma-Delimited Files -- Reading Other Delimited Files -- Data Exploration -- Describing the Data -- Instance -- Feature -- Dimensionality -- Sparsity and Density -- Resolution -- Descriptive Statistics -- Visualizing the Data -- Comparison -- Relationship -- Distribution -- Composition -- Data Preparation -- Cleaning the Data -- Missing Values -- Noise -- Outliers -- Class Imbalance -- Transforming the Data -- Normalization -- Discretization -- Dummy Coding