Just Enough R An Interactive Approach to Machine Learning and Analytics

Just Enough R! An Interactive Approach to Machine Learning and Analytics presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing...

Full description

Saved in:

Bibliographic Details
Main Author:	Roiger, Richard J.
Format:	eBook Book
Language:	English
Published:	Boca Raton CRC Press 2020 Chapman & Hall/CRC CRC Press LLC Chapman & Hall
Edition:	1
Subjects:	Cognitive Artificial Intelligence COMPUTERSCIENCEnetBASE Data Preparation & Mining Data structures (Computer science) INFORMATIONSCIENCEnetBASE Machine Learning Machine Learning - Design Mathematical statistics Mathematical statistics > Data processing Neural Networks Programming & Programming Languages R (Computer program language) SCI-TECHnetBASE Statistical Computing STATSnetBASE STMnetBASE Agglomerative Clustering Life Insurance Promotion Test Set Accuracy Supervised Models Association Rules Supervised Learner Models Roc Curve Multiple Linear Regression Test Set Instance Arules Package Regression Model Classifier Error Rate Residual Standard Error Magazine Promotion XOR Function Lumbar Extension Missing Data Items Unsupervised Clustering Credit Card Insurance Input Attributes Market Basket Analysis Time Series Gamma Ray Burst Roc Graph Credit Card Promotion Database
ISBN:	9780367443207, 9780367439149, 0367443201, 036743914X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Cover -- Half Title -- Title Page -- Copyright Page -- Table of Contents -- Preface -- Acknowledgment -- Author -- Chapter 1 Introduction to Machine Learning -- 1.1 Machine Learning, Statistical Analysis, and Data Science -- 1.2 Machine Learning: A First Example -- 1.2.1 Attribute-Value format -- 1.2.2 A Decision Tree for Diagnosing Illness -- 1.3 Machine Learning Strategies -- 1.3.1 Classicisation -- 1.3.2 Estimation -- 1.3.3 Prediction -- 1.3.4 Unsupervised Clustering -- 1.3.5 Market Basket Analysis -- 1.4 Evaluating Performance -- 1.4.1 Evaluating Supervised Models -- 1.4.2 Two-Class Error Analysis -- 1.4.3 Evaluating Numeric Output -- 1.4.4 Comparing Models by Measuring Lift -- 1.4.5 Unsupervised Model Evaluation -- 1.5 Ethical Issues -- 1.6 Chapter Summary -- 1.7 Key Terms -- Exercises -- Chapter 2 Introduction to R -- 2.1 Introducing R And RStudio -- 2.1.1 Features of R -- 2.1.2 Installing R -- 2.1.3 Installing RStudio -- 2.2 Navigating RStudio -- 2.2.1 The Console -- 2.2.2 The Source Panel -- 2.2.3 The Global Environment -- 2.2.4 Packages -- 2.3 Where's The Data? -- 2.4 Obtaining Help and Additional Information -- 2.5 Summary -- Exercises -- Chapter 3 Data Structures and Manipulation -- 3.1 Data Type -- 3.1.1 Character Data and Factors -- 3.2 Single-Mode Data Structures -- 3.2.1 Vectors -- 3.2.2 Matrices and Arrays -- 3.3 Multimode Data Structures -- 3.3.1 Lists -- 3.3.2 Data Frames -- 3.4 Writing Your Own Functions -- 3.4.1 Writing a Simple Function -- 3.4.2 Conditional Statements -- 3.4.3 Iteration -- 3.4.4 Recursive Programming -- 3.5 Summary -- 3.6 Key Terms -- Exercises -- Chapter 4 Preparing the Data -- 4.1 A Process Model for Knowledge Discovery -- 4.2 Creating A Target Dataset -- 4.2.1 Interfacing R with the Relational Model -- 4.2.2 Additional Sources for Target Data -- 4.3 Data Preprocessing -- 4.3.1 Noisy Data
11.5.6 Agglomerative Clustering Of Credit Screening Data -- 11.6 Chapter Summary -- 11.7 Key Terms -- Exercises -- Chapter 12 A Case Study in Predicting Treatment Outcome -- 12.1 Goal Identification -- 12.2 A Measure of Treatment Success -- 12.3 Target Data Creation -- 12.4 Data Preprocessing -- 12.5 Data Transformation -- 12.6 Data Mining -- 12.6.1 Two-Class Experiments -- 12.7 Interpretation and Evaluation -- 12.7.1 Should Patients Torso Rotate? -- 12.8 Taking Action -- 12.9 Chapter Summary -- Bibliography -- Appendix A: Supplementary Materials and More Datasets -- Appendix B: Statistics for Performance Evaluation -- Subject Index -- Index of R Functions -- Script Index
4.3.2 Preprocessing With R -- 4.3.3 Detecting Outliers -- 4.3.4 Missing Data -- 4.4 Data Transformation -- 4.4.1 Data Normalization -- 4.4.2 Data Type Conversion -- 4.4.3 Attribute and Instance Selection -- 4.4.4 Creating Training and Test Set Data -- 4.4.5 Cross Validation and Bootstrapping -- 4.4.6 Large-Sized Data -- 4.5 Chapter Summary -- 4.6 Key Terms -- Exercises -- Chapter 5 Supervised Statistical Techniques -- 5.1 Simple Linear Regression -- 5.2 Multiple Linear Regression -- 5.2.1 Multiple Linear Regression: An Example -- 5.2.2 Evaluating Numeric Output -- 5.2.3 Training/Test Set Evaluation -- 5.2.4 Using Cross Validation -- 5.2.5 Linear Regression with Categorical Data -- 5.3 Logistic Regression -- 5.3.1 Transforming the Linear Regression Model -- 5.3.2 The Logistic Regression Model -- 5.3.3 Logistic Regression with R -- 5.3.4 Creating a Confusion Matrix -- 5.3.5 Receiver Operating Characteristics (ROC) Curves -- 5.3.6 The Area under an ROC Curve -- 5.4 Naïve Bayes Classifier -- 5.4.1 Bayes Classifier: An Example -- 5.4.2 Zero-Valued Attribute Counts -- 5.4.3 Missing Data -- 5.4.4 Numeric Data -- 5.4.5 Experimenting With Naïve Bayes -- 5.5 Chapter Summary -- 5.6 Key Terms -- Exercises -- Chapter 6 Tree-Based Methods -- 6.1 A Decision Tree Algorithm -- 6.1.1 An Algorithm for Building Decision Trees -- 6.1.2 C4.5 Attribute Selection -- 6.1.3 Other Methods for Building Decision Trees -- 6.2 Building Decision Trees: C5.0 -- 6.2.1 A Decision Tree for Credit Card Promotions -- 6.2.2 Data for Simulating Customer Churn -- 6.2.3 Predicting Customer Churn with C5.0 -- 6.3 Building Decision Trees: Rpart -- 6.3.1 An Rpart Decision Tree for Credit Card Promotions -- 6.3.2 Train and Test Rpart: Churn Data -- 6.3.3 Cross Validation Rpart: Churn Data -- 6.4 Building Decision Trees: J48 -- 6.5 Ensemble Techniques for Improving Performance -- 6.5.1 Bagging
6.5.2 Boosting -- 6.5.3 Boosting: An Example with C5.0 -- 6.5.4 Random Forests -- 6.6 Regression Trees -- 6.7 Chapter Summary -- 6.8 Key Terms -- Exercises -- Chapter 7 Rule-Based Techniques -- 7.1 From Trees to Rules -- 7.1.1 The Spam Email Dataset -- 7.1.2 Spam Email Classification: C5.0 -- 7.2 A Basic Covering Rule Algorithm -- 7.2.1 Generating Covering Rules With JRip -- 7.3 Generating Association Rules -- 7.3.1 Confidence and Support -- 7.3.2 Mining Association Rules: An Example -- 7.3.3 General Considerations -- 7.3.4 Rweka's Apriori Function -- 7.4 Shake, Rattle, and Roll -- 7.5 Chapter Summary -- 7.6 Key Terms -- Exercises -- Chapter 8 Neural Networks -- 8.1 Feed-Forward Neural Networks -- 8.1.1 Neural Network Input Format -- 8.1.2 Neural Network Output Format -- 8.1.3 The Sigmoid Evaluation Function -- 8.2 Neural Network Training: A Conceptual View -- 8.2.1 Supervised Learning with Feed-Forward Networks -- 8.2.2 Unsupervised Clustering With Self-Organizing Maps -- 8.3 Neural Network Explanation -- 8.4 General Considerations -- 8.4.1 Strengths -- 8.4.2 Weaknesses -- 8.5 Neural Network Training: A Detailed View -- 8.5.1 The Backpropagation Algorithm: An Example -- 8.5.2 Kohonen Self-Organizing Maps: An Example -- 8.6 Building Neural Networks with R -- 8.6.1 The Exclusive-OR Function -- 8.6.2 Modeling Exclusive-OR With MLP: Numeric Output -- 8.6.3 Modeling Exclusive-OR With MLP: Categorical Output -- 8.6.4 Modeling Exclusive-OR With Neuralnet: Numeric Output -- 8.6.5 Modeling Exclusive-OR With Neuralnet: Categorical Output -- 8.6.6 Classifying Satellite Image Data -- 8.6.7 Testing For Diabetes -- 8.7 Neural Net Clustering For Attribute Evaluation -- 8.8 Times Series Analysis -- 8.8.1 Stock Market Analytics -- 8.8.2 Time Series Analysis: An Example -- 8.8.3 The Target Data -- 8.8.4 Modeling the Time Series -- 8.8.5 General Considerations
8.9 Chapter Summary -- 8.10 Key Terms -- Exercises -- Chapter 9 Formal Evaluation Techniques -- 9.1 What Should Be Evaluated? -- 9.2 Tools for Evaluation -- 9.2.1 Single-Valued Summary Statistics -- 9.2.2 The Normal Distribution -- 9.2.3 Normal Distributions and Sample Means -- 9.2.4 A Classical Model for Hypothesis Testing -- 9.3 Computing Test Set Confidence Intervals -- 9.4 Comparing Supervised Models -- 9.4.1 Comparing the Performance of Two Models -- 9.4.2 Comparing the Performance of Two or More Models -- 9.5 Confidence Intervals for Numeric Output -- 9.6 Chapter Summary -- 9.7 Key Terms -- Exercises -- Chapter 10 Support Vector Machines -- 10.1 Linearly Separable Classes -- 10.2 The Nonlinear Case -- 10.3 Experimenting With Linearly Separable Data -- 10.4 Microarray Data Mining -- 10.4.1 DNA and Gene Expression -- 10.4.2 Preprocessing Microarray Data: Attribute Selection -- 10.4.3 Microarray Data Mining: Issues -- 10.5 A Microarray Application -- 10.5.1 Establishing a Benchmark -- 10.5.2 Attribute Elimination -- 10.6 Chapter Summary -- 10.7 Key Terms -- Exercises -- Chapter 11 Unsupervised Clustering Techniques -- 11.1 The K-Means Algorithm -- 11.1.1 An Example Using K-Means -- 11.1.2 General Considerations -- 11.2 Agglomerative Clustering -- 11.2.1 Agglomerative Clustering: An Example -- 11.2.2 General Considerations -- 11.3 Conceptual Clustering -- 11.3.1 Measuring Category Utility -- 11.3.2 Conceptual Clustering: An Example -- 11.3.3 General Considerations -- 11.4 Expectation Maximization -- 11.5 Unsupervised Clustering With R -- 11.5.1 Supervised Learning for Cluster Evaluation -- 11.5.2 Unsupervised Clustering For Attribute Evaluation -- 11.5.3 Agglomerative Cluster: A Simple Example -- 11.5.4 Agglomerative Clustering of Gamma-Ray Burst data -- 11.5.5 Agglomerative Clustering Of Cardiology Patient Data