Just Enough R An Interactive Approach to Machine Learning and Analytics

Just Enough R! An Interactive Approach to Machine Learning and Analytics presents just enough of the R language, machine learning algorithms, statistical methodology, and analytics for the reader to learn how to find interesting structure in data. The approach might be called "seeing then doing...

Celý popis

Uloženo v:
Podrobná bibliografie
Hlavní autor: Roiger, Richard J.
Médium: E-kniha Kniha
Jazyk:angličtina
Vydáno: Boca Raton CRC Press 2020
Chapman & Hall/CRC
CRC Press LLC
Chapman & Hall
Vydání:1
Témata:
ISBN:9780367443207, 9780367439149, 0367443201, 036743914X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Obsah:
  • Cover -- Half Title -- Title Page -- Copyright Page -- Table of Contents -- Preface -- Acknowledgment -- Author -- Chapter 1 Introduction to Machine Learning -- 1.1 Machine Learning, Statistical Analysis, and Data Science -- 1.2 Machine Learning: A First Example -- 1.2.1 Attribute-Value format -- 1.2.2 A Decision Tree for Diagnosing Illness -- 1.3 Machine Learning Strategies -- 1.3.1 Classicisation -- 1.3.2 Estimation -- 1.3.3 Prediction -- 1.3.4 Unsupervised Clustering -- 1.3.5 Market Basket Analysis -- 1.4 Evaluating Performance -- 1.4.1 Evaluating Supervised Models -- 1.4.2 Two-Class Error Analysis -- 1.4.3 Evaluating Numeric Output -- 1.4.4 Comparing Models by Measuring Lift -- 1.4.5 Unsupervised Model Evaluation -- 1.5 Ethical Issues -- 1.6 Chapter Summary -- 1.7 Key Terms -- Exercises -- Chapter 2 Introduction to R -- 2.1 Introducing R And RStudio -- 2.1.1 Features of R -- 2.1.2 Installing R -- 2.1.3 Installing RStudio -- 2.2 Navigating RStudio -- 2.2.1 The Console -- 2.2.2 The Source Panel -- 2.2.3 The Global Environment -- 2.2.4 Packages -- 2.3 Where's The Data? -- 2.4 Obtaining Help and Additional Information -- 2.5 Summary -- Exercises -- Chapter 3 Data Structures and Manipulation -- 3.1 Data Type -- 3.1.1 Character Data and Factors -- 3.2 Single-Mode Data Structures -- 3.2.1 Vectors -- 3.2.2 Matrices and Arrays -- 3.3 Multimode Data Structures -- 3.3.1 Lists -- 3.3.2 Data Frames -- 3.4 Writing Your Own Functions -- 3.4.1 Writing a Simple Function -- 3.4.2 Conditional Statements -- 3.4.3 Iteration -- 3.4.4 Recursive Programming -- 3.5 Summary -- 3.6 Key Terms -- Exercises -- Chapter 4 Preparing the Data -- 4.1 A Process Model for Knowledge Discovery -- 4.2 Creating A Target Dataset -- 4.2.1 Interfacing R with the Relational Model -- 4.2.2 Additional Sources for Target Data -- 4.3 Data Preprocessing -- 4.3.1 Noisy Data
  • 11.5.6 Agglomerative Clustering Of Credit Screening Data -- 11.6 Chapter Summary -- 11.7 Key Terms -- Exercises -- Chapter 12 A Case Study in Predicting Treatment Outcome -- 12.1 Goal Identification -- 12.2 A Measure of Treatment Success -- 12.3 Target Data Creation -- 12.4 Data Preprocessing -- 12.5 Data Transformation -- 12.6 Data Mining -- 12.6.1 Two-Class Experiments -- 12.7 Interpretation and Evaluation -- 12.7.1 Should Patients Torso Rotate? -- 12.8 Taking Action -- 12.9 Chapter Summary -- Bibliography -- Appendix A: Supplementary Materials and More Datasets -- Appendix B: Statistics for Performance Evaluation -- Subject Index -- Index of R Functions -- Script Index
  • 4.3.2 Preprocessing With R -- 4.3.3 Detecting Outliers -- 4.3.4 Missing Data -- 4.4 Data Transformation -- 4.4.1 Data Normalization -- 4.4.2 Data Type Conversion -- 4.4.3 Attribute and Instance Selection -- 4.4.4 Creating Training and Test Set Data -- 4.4.5 Cross Validation and Bootstrapping -- 4.4.6 Large-Sized Data -- 4.5 Chapter Summary -- 4.6 Key Terms -- Exercises -- Chapter 5 Supervised Statistical Techniques -- 5.1 Simple Linear Regression -- 5.2 Multiple Linear Regression -- 5.2.1 Multiple Linear Regression: An Example -- 5.2.2 Evaluating Numeric Output -- 5.2.3 Training/Test Set Evaluation -- 5.2.4 Using Cross Validation -- 5.2.5 Linear Regression with Categorical Data -- 5.3 Logistic Regression -- 5.3.1 Transforming the Linear Regression Model -- 5.3.2 The Logistic Regression Model -- 5.3.3 Logistic Regression with R -- 5.3.4 Creating a Confusion Matrix -- 5.3.5 Receiver Operating Characteristics (ROC) Curves -- 5.3.6 The Area under an ROC Curve -- 5.4 Naïve Bayes Classifier -- 5.4.1 Bayes Classifier: An Example -- 5.4.2 Zero-Valued Attribute Counts -- 5.4.3 Missing Data -- 5.4.4 Numeric Data -- 5.4.5 Experimenting With Naïve Bayes -- 5.5 Chapter Summary -- 5.6 Key Terms -- Exercises -- Chapter 6 Tree-Based Methods -- 6.1 A Decision Tree Algorithm -- 6.1.1 An Algorithm for Building Decision Trees -- 6.1.2 C4.5 Attribute Selection -- 6.1.3 Other Methods for Building Decision Trees -- 6.2 Building Decision Trees: C5.0 -- 6.2.1 A Decision Tree for Credit Card Promotions -- 6.2.2 Data for Simulating Customer Churn -- 6.2.3 Predicting Customer Churn with C5.0 -- 6.3 Building Decision Trees: Rpart -- 6.3.1 An Rpart Decision Tree for Credit Card Promotions -- 6.3.2 Train and Test Rpart: Churn Data -- 6.3.3 Cross Validation Rpart: Churn Data -- 6.4 Building Decision Trees: J48 -- 6.5 Ensemble Techniques for Improving Performance -- 6.5.1 Bagging
  • 6.5.2 Boosting -- 6.5.3 Boosting: An Example with C5.0 -- 6.5.4 Random Forests -- 6.6 Regression Trees -- 6.7 Chapter Summary -- 6.8 Key Terms -- Exercises -- Chapter 7 Rule-Based Techniques -- 7.1 From Trees to Rules -- 7.1.1 The Spam Email Dataset -- 7.1.2 Spam Email Classification: C5.0 -- 7.2 A Basic Covering Rule Algorithm -- 7.2.1 Generating Covering Rules With JRip -- 7.3 Generating Association Rules -- 7.3.1 Confidence and Support -- 7.3.2 Mining Association Rules: An Example -- 7.3.3 General Considerations -- 7.3.4 Rweka's Apriori Function -- 7.4 Shake, Rattle, and Roll -- 7.5 Chapter Summary -- 7.6 Key Terms -- Exercises -- Chapter 8 Neural Networks -- 8.1 Feed-Forward Neural Networks -- 8.1.1 Neural Network Input Format -- 8.1.2 Neural Network Output Format -- 8.1.3 The Sigmoid Evaluation Function -- 8.2 Neural Network Training: A Conceptual View -- 8.2.1 Supervised Learning with Feed-Forward Networks -- 8.2.2 Unsupervised Clustering With Self-Organizing Maps -- 8.3 Neural Network Explanation -- 8.4 General Considerations -- 8.4.1 Strengths -- 8.4.2 Weaknesses -- 8.5 Neural Network Training: A Detailed View -- 8.5.1 The Backpropagation Algorithm: An Example -- 8.5.2 Kohonen Self-Organizing Maps: An Example -- 8.6 Building Neural Networks with R -- 8.6.1 The Exclusive-OR Function -- 8.6.2 Modeling Exclusive-OR With MLP: Numeric Output -- 8.6.3 Modeling Exclusive-OR With MLP: Categorical Output -- 8.6.4 Modeling Exclusive-OR With Neuralnet: Numeric Output -- 8.6.5 Modeling Exclusive-OR With Neuralnet: Categorical Output -- 8.6.6 Classifying Satellite Image Data -- 8.6.7 Testing For Diabetes -- 8.7 Neural Net Clustering For Attribute Evaluation -- 8.8 Times Series Analysis -- 8.8.1 Stock Market Analytics -- 8.8.2 Time Series Analysis: An Example -- 8.8.3 The Target Data -- 8.8.4 Modeling the Time Series -- 8.8.5 General Considerations
  • 8.9 Chapter Summary -- 8.10 Key Terms -- Exercises -- Chapter 9 Formal Evaluation Techniques -- 9.1 What Should Be Evaluated? -- 9.2 Tools for Evaluation -- 9.2.1 Single-Valued Summary Statistics -- 9.2.2 The Normal Distribution -- 9.2.3 Normal Distributions and Sample Means -- 9.2.4 A Classical Model for Hypothesis Testing -- 9.3 Computing Test Set Confidence Intervals -- 9.4 Comparing Supervised Models -- 9.4.1 Comparing the Performance of Two Models -- 9.4.2 Comparing the Performance of Two or More Models -- 9.5 Confidence Intervals for Numeric Output -- 9.6 Chapter Summary -- 9.7 Key Terms -- Exercises -- Chapter 10 Support Vector Machines -- 10.1 Linearly Separable Classes -- 10.2 The Nonlinear Case -- 10.3 Experimenting With Linearly Separable Data -- 10.4 Microarray Data Mining -- 10.4.1 DNA and Gene Expression -- 10.4.2 Preprocessing Microarray Data: Attribute Selection -- 10.4.3 Microarray Data Mining: Issues -- 10.5 A Microarray Application -- 10.5.1 Establishing a Benchmark -- 10.5.2 Attribute Elimination -- 10.6 Chapter Summary -- 10.7 Key Terms -- Exercises -- Chapter 11 Unsupervised Clustering Techniques -- 11.1 The K-Means Algorithm -- 11.1.1 An Example Using K-Means -- 11.1.2 General Considerations -- 11.2 Agglomerative Clustering -- 11.2.1 Agglomerative Clustering: An Example -- 11.2.2 General Considerations -- 11.3 Conceptual Clustering -- 11.3.1 Measuring Category Utility -- 11.3.2 Conceptual Clustering: An Example -- 11.3.3 General Considerations -- 11.4 Expectation Maximization -- 11.5 Unsupervised Clustering With R -- 11.5.1 Supervised Learning for Cluster Evaluation -- 11.5.2 Unsupervised Clustering For Attribute Evaluation -- 11.5.3 Agglomerative Cluster: A Simple Example -- 11.5.4 Agglomerative Clustering of Gamma-Ray Burst data -- 11.5.5 Agglomerative Clustering Of Cardiology Patient Data