Data mining for the social sciences an introduction

We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Attewell, Paul, Monaghan, David
Format: E-Book Buch
Sprache:Englisch
Veröffentlicht: Oakland, Calif University of California Press 2015
Ausgabe:1
Schlagworte:
ISBN:9780520280984, 9780520960596, 0520960599, 9780520280977, 0520280989, 0520280970
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Inhaltsangabe:
  • Data mining for the social sciences : an introduction -- Contents -- Acknowledgments -- Part 1: Concepts -- 1. What Is Data Mining? -- 2. Contrasts with the Conventional Statistical Approach -- 3. Some General Strategies Used in Data Mining -- 4. Important Stages in a Data Mining Project -- Part 2: Worked Examples -- 5. Preparing Training and Test Datasets -- 6. Variable Selection Tools -- 7. Creating New Variables Using Binning and Trees -- 8. Extracting Variables -- 9. Classifiers -- 10. Classification Trees -- 11. Neural Networks -- 12. Clustering -- 13. Latent Class Analysis and Mixture Models -- 14. Association Rules -- Conclusion -- Bibliography -- Notes -- Index.
  • Front Matter Table of Contents ACKNOWLEDGMENTS 1: WHAT IS DATA MINING? 2: CONTRASTS WITH THE CONVENTIONAL STATISTICAL APPROACH 3: SOME GENERAL STRATEGIES USED IN DATA MINING 4: IMPORTANT STAGES IN A DATA MINING PROJECT 5: PREPARING TRAINING AND TEST DATASETS 6: VARIABLE SELECTION TOOLS 7: Creating New Variables Using Binning and Trees 8: EXTRACTING VARIABLES 9: CLASSIFIERS 10: CLASSIFICATION TREES 11: NEURAL NETWORKS 12: CLUSTERING 13: LATENT CLASS ANALYSIS AND MIXTURE MODELS 14: ASSOCIATION RULES CONCLUSION BIBLIOGRAPHY NOTES INDEX
  • Boosted Trees and Random Forests -- 11. Neural Networks -- 12. Clustering -- Hierarchical Clustering -- K-Means Clustering -- Normal Mixtures -- Self-Organized Maps -- 13. Latent Class Analysis and Mixture Models -- Latent Class Analysis -- Latent Class Regression -- Mixture Models -- 14. Association Rules -- Conclusion -- Bibliography -- Notes -- Index -- A -- B -- C -- D -- E -- F -- G -- H -- I -- J -- K -- L -- M -- N -- O -- P -- R -- S -- T -- U -- V -- W -- X -- Y -- Z
  • Cover -- Title -- Copyright -- Contents -- Acknowledgments -- PART 1. CONCEPTS -- 1. What Is Data Mining? -- The Goals of This Book -- Software and Hardware for Data Mining -- Basic Terminology -- 2. Contrasts with the Conventional Statistical Approach -- Predictive Power in Conventional Statistical Modeling -- Hypothesis Testing in the Conventional Approach -- Heteroscedasticity as a Threat to Validity in Conventional Modeling -- The Challenge of Complex and Nonrandom Samples -- Bootstrapping and Permutation Tests -- Nonlinearity in Conventional Predictive Models -- Statistical Interactions in Conventional Models -- Conclusion -- 3. Some General Strategies Used in Data Mining -- Cross-Validation -- Overfitting -- Boosting -- Calibrating -- Measuring Fit: The Confusion Matrix and ROC Curves -- Identifying Statistical Interactions and Effect Heterogeneity in Data Mining -- Bagging and Random Forests -- The Limits of Prediction -- Big Data Is Never Big Enough -- 4. Important Stages in a Data Mining Project -- When to Sample Big Data -- Building a Rich Array of Features -- Feature Selection -- Feature Extraction -- Constructing a Model -- PART 2. WORKED EXAMPLES -- 5. Preparing Training and Test Datasets -- The Logic of Cross-Validation -- Cross-Validation Methods: An Overview -- 6. Variable Selection Tools -- Stepwise Regression -- The LASSO -- VIF Regression -- 7. Creating New Variables Using Binning and Trees -- Discretizing a Continuous Predictor -- Continuous Outcomes and Continuous Predictors -- Binning Categorical Predictors -- Using Partition Trees to Study Interactions -- 8. Extracting Variables -- Principal Component Analysis -- Independent Component Analysis -- 9. Classifiers -- K-Nearest Neighbors -- Naive Bayes -- Support Vector Machines -- Optimizing Prediction across Multiple Classifiers -- 10. Classification Trees -- Partition Trees
  • ACKNOWLEDGMENTS --
  • 14. ASSOCIATION RULES --
  • 8. EXTRACTING VARIABLES --
  • 4. IMPORTANT STAGES IN A DATA MINING PROJECT --
  • 9. CLASSIFIERS --
  • 3. SOME GENERAL STRATEGIES USED IN DATA MINING --
  • PART 2 WORKED EXAMPLES --
  • CONTENTS --
  • 5. PREPARING TRAINING AND TEST DATASETS --
  • CONCLUSION. Where Next? --
  • 10. CLASSIFICATION TREES --
  • NOTES --
  • PART 1 CONCEPTS --
  • 1. WHAT IS DATA MINING? --
  • 7. CREATING NEW VARIABLES --
  • 12. CLUSTERING --
  • BIBLIOGRAPHY --
  • Frontmatter --
  • 2. CONTRASTS WITH THE CONVENTIONAL STATISTICAL APPROACH --
  • 6. VARIABLE SELECTION TOOLS --
  • INDEX
  • 11. NEURAL NETWORKS --
  • 13. LATENT CLASS ANALYSIS AND MIXTURE MODELS --