Data analysis using SQL and Excel

A practical guide to data mining using SQL and Excel Data Analysis Using SQL and Excel, 2nd Edition shows you how to leverage the two most popular tools for data query and analysis—SQL and Excel—to perform sophisticated data analysis without the need for complex and expensive data mining tools. Writ...

Celý popis

Uloženo v:
Podrobná bibliografie
Hlavní autor: Linoff, Gordon S
Médium: E-kniha Kniha
Jazyk:angličtina
Vydáno: Indianapolis, IN John Wiley & Sons, Inc 2015
Wiley
John Wiley & Sons, Incorporated
Wiley-Blackwell
Vydání:2nd ed.
Témata:
ISBN:1119021456, 9781119021445, 1119021448, 111902143X, 9781119021452, 9781119021438
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Obsah:
  • Data analysis using SQL and Excel -- Dedication -- About the Author -- Credits -- Acknowledgments -- Contents at a Glance -- Contents -- Foreword -- Introduction -- Chapter 1. A Data Miner Looks at SQL -- Chapter 2. What's in a Table? Getting Started with Data Exploration -- Chapter 3. How Different Is Different? -- Chapter 4. Where Is It All Happening? Location, Location, Location -- Chapter 5. It's a Matter of Time -- Chapter 6. How Long Will Customers Last? Survival Analysis to Understand Customers and Their Value -- Chapter 7. Factors Affecting Survival: The What and Why of Customer Tenure -- Chapter 8. Customer Purchases and Other Repeated Events -- Chapter 9. What's in a Shopping Cart? Market Basket Analysis -- Chapter 10. Association Rules and Beyond -- Chapter 11. Data Mining Models in SQL -- Chapter 12. The Best-Fit Line: Linear Regression Models -- Chapter 13. Building Customer Signatures for Further Analysis -- Chapter 14. Performance Is the Issue: Using SQL Effectively -- Appendix: Equivalent Constructs Among Databases -- Index -- EULA.
  • What Is Customer Survival by Year of Start?
  • Comparison of Zip Codes with and without Orders -- Zip Codes Not in Census File -- Profiles of Zip Codes with and without Orders -- Classifying and Comparing Zip Codes -- Geographic Hierarchies -- Wealthiest Zip Code in a State? -- Zip Code with the Most Orders in Each State -- Interesting Hierarchies in Geographic Data -- Counties -- Designated Marketing Areas -- Census Hierarchies -- Other Geographic Subdivisions -- Geography on the Web -- Calculating County Wealth -- Identifying Counties -- Measuring Wealth -- Distribution of Values of Wealth -- Which Zip Code Is Wealthiest Relative to Its County? -- County with Highest Relative Order Penetration -- Mapping in Excel -- Why Create Maps? -- It Can't Be Mapped -- Mapping on the Web -- State Boundaries on Scatter Plots of Zip Codes -- Plotting State Boundaries -- Pictures of State Boundaries -- Lessons Learned -- Chapter 5 It's a Matter of Time -- Dates and Times in Databases -- Some Fundamentals of Dates and Times in Databases -- Extracting Components of Dates and Times -- Converting to Standard Formats -- Intervals (Durations) -- Time Zones -- Calendar Table -- Starting to Investigate Dates -- Verifying That Dates Have No Times -- Comparing Counts by Date -- Order Lines Shipped and Billed -- Customers Shipped and Billed -- Number of Different Bill and Ship Dates per Order -- Counts of Orders and Order Sizes -- Items as Measured by Number of Units -- Items as Measured by Distinct Products -- Size as Measured by Dollars -- Days of the Week -- Billing Date by Day of the Week -- Changes in Day of the Week by Year -- Comparison of Days of the Week for Two Dates -- How Long Between Two Dates? -- Duration in Days -- Duration in Weeks -- Duration in Months -- How Many Mondays? -- A Business Problem about Days of the Week -- Outline of a Solution -- Solving It in SQL -- Using a Calendar Table Instead
  • When Is the Next Anniversary (or Birthday)? -- First Year Anniversary This Month -- First Year Anniversary Next Month -- Manipulating Dates to Calculate the Next Anniversary -- Year-over-Year Comparisons -- Comparisons by Day -- Adding a Moving Average Trend Line -- Comparisons by Week -- Comparisons by Month -- Month-to-Date Comparison -- Extrapolation by Days in Month -- Estimation Based on Day of Week -- Estimation Based on Previous Year -- Counting Active Customers by Day -- How Many Customers on a Given Day? -- How Many Customers Every Day? -- How Many Customers of Different Types? -- How Many Customers by Tenure Segment? -- Calculating Actives Entirely Using SQL -- Simple Chart Animation in Excel -- Order Date to Ship Date -- Order Date to Ship Date by Year -- Querying the Data -- Creating the One-Year Excel Table -- Creating and Customizing the Chart -- Lessons Learned -- Chapter 6 How Long Will Customers Last? Survival Analysis to Understand Customers and Their Value -- Background on Survival Analysis -- Life Expectancy -- Medical Research -- Examples of Hazards -- The Hazard Calculation -- Data Investigation -- Stop Flag -- Tenure -- Hazard Probability -- Visualizing Customers: Time versus Tenure -- Censoring -- Survival and Retention -- Point Estimate for Survival -- Calculating Survival for All Tenures -- Calculating Survival in SQL -- Calculating the Product of Column Values -- Adding in More Dimensions -- A Simple Customer Retention Calculation -- Comparison between Retention and Survival -- Simple Example of Hazard and Survival -- Constant Hazard -- What Happens to a Mixture? -- Constant Hazard Corresponding to Survival -- Comparing Different Groups of Customers -- Summarizing the Markets -- Stratifying by Market -- Survival Ratio -- Conditional Survival -- Comparing Survival over Time -- How Has a Particular Hazard Changed over Time?
  • Chapter 2 What's in a Table? Getting Started with Data Exploration -- What Is Data Exploration? -- Excel for Charting -- A Basic Chart: Column Charts -- Inserting the Data -- Creating the Column Chart -- Formatting the Column Chart -- Bar Charts in Cells -- Character-Based Bar Charts -- Conditional Formatting-Based Bar Charts -- Useful Variations on the Column Chart -- A New Query -- Side-by-Side Columns -- Stacked Columns -- Stacked and Normalized Columns -- Number of Orders and Revenue -- Other Types of Charts -- Line Charts -- Area Charts -- X-Y Charts (Scatter Plots) -- Sparklines -- What Values Are in the Columns? -- Histograms -- Histograms of Counts -- Cumulative Histograms of Counts -- Histograms (Frequencies) for Numeric Values -- Ranges Based on the Number of Digits, Using Numeric Techniques -- Ranges Based on the Number of Digits, Using String Techniques -- More Refined Ranges: First Digit Plus Number of Digits -- Breaking Numeric Values into Equal-Sized Groups -- More Values to Explore-Min, Max, and Mode -- Minimum and Maximum Values -- The Most Common Value (Mode) -- Calculating Mode Using Basic SQL -- Calculating Mode Using Window Functions -- Exploring String Values -- Histogram of Length -- Strings Starting or Ending with Spaces -- Handling Upper- and Lowercase -- What Characters Are in a String? -- Exploring Values in Two Columns -- What Are Average Sales by State? -- How Often Are Products Repeated within a Single Order? -- Direct Counting Approach -- Comparison of Distinct Counts to Overall Counts -- Which State Has the Most American Express Users? -- From Summarizing One Column to Summarizing All Columns -- Good Summary for One Column -- Query to Get All Columns in a Table -- Using SQL to Generate Summary Code -- Lessons Learned -- Chapter 3 How Different Is Different? -- Basic Statistical Concepts -- The Null Hypothesis
  • Confidence and Probability -- Normal Distribution -- How Different Are the Averages? -- The Approach -- Standard Deviation for Subset Averages -- Three Approaches -- Estimation Based on Two Samples -- Estimation Based on Difference -- Sampling from a Table -- Random Sample -- Repeatable Random Sample -- Proportional Stratified Sample -- Balanced Sample -- Counting Possibilities -- How Many Men? -- How Many Californians? -- Null Hypothesis and Confidence -- How Many Customers Are Still Active? -- Given the Count, What Is the Probability? -- Given the Probability, What Is the Number of Stops? -- The Rate or the Number? -- Ratios and Their Statistics -- Standard Error of a Proportion -- Confidence Interval on Proportions -- Difference of Proportions -- Conservative Lower Bounds -- Chi-Square -- Expected Values -- Chi-Square Calculation -- Chi-Square Distribution -- Chi-Square in SQL -- What States Have Unusual Affinities for Which Types of Products? -- Data Investigation -- SQL to Calculate Chi-Square Values -- Affinity Results -- What Months and Payment Types Have Unusual Affinities for Which Types of Products? -- Multidimensional Chi-Square -- Using a SQL Query -- The Results -- Lessons Learned -- Chapter 4 Where Is It All Happening? Location, Location, Location -- Latitude and Longitude -- Definition of Latitude and Longitude -- Degrees, Minutes, Seconds, and All That -- Distance between Two Locations -- Euclidian Method -- Accurate Method -- Finding All Zip Codes within a Given Distance -- Finding Nearest Zip Code in Excel -- Pictures with Zip Codes -- The Scatter Plot Map -- Who Uses Solar Power for Heating? -- Where Are the Customers? -- Census Demographics -- The Extremes: Richest and Poorest -- Median Income -- Proportion of Wealthy and Poor -- Income Similarity and Dissimilarity Using Chi-Square
  • Intro -- Data Analysis Using SQL and Excel® -- About the Author -- Credits -- Acknowledgments -- Contents at a Glance -- Contents -- Foreword -- Introduction -- Chapter 1 A Data Miner Looks at SQL -- Databases, SQL, and Big Data -- What Is Big Data? -- Relational Databases -- Hadoop and Hive -- NoSQL and Other Types of Databases -- SQL -- Picturing the Structure of the Data -- What Is a Data Model? -- What Is a Table? -- Allowing NULL Values -- Column Types -- What Is an Entity-Relationship Diagram? -- The Zip Code Tables -- Subscription Dataset -- Purchases Dataset -- Tips on Naming Things -- Picturing Data Analysis Using Dataflows -- What Is a Dataflow? -- READ: Reading a Database Table -- OUTPUT: Outputting a Table (or Chart) -- SELECT: Selecting Various Columns in the Table -- FILTER: Filtering Rows Based on a Condition -- APPEND: Appending New Calculated Columns -- UNION: Combining Multiple Datasets into One -- AGGREGATE: Aggregating Values -- LOOKUP: Looking Up Values in One Table in Another -- CROSSJOIN: Generating the Cartesian Product of Two Tables -- JOIN: Combining Two Tables Using a Key Column -- SORT: Ordering the Results of a Dataset -- Dataflows, SQL, and Relational Algebra -- SQL Queries -- What to Do, Not How to Do It -- The SELECT Statement -- A Basic SQL Query -- A Basic Summary SQL Query -- What It Means to Join Tables -- Cross-Joins: The Most General Joins -- Lookup: A Useful Join -- Equijoins -- Nonequijoins -- Outer Joins -- Other Important Capabilities in SQL -- UNION ALL -- CASE -- IN -- Window Functions -- Subqueries and Common Table Expressions Are Our Friends -- Subqueries for Naming Variables -- Subqueries for Handling Summaries -- Subqueries and IN -- Rewriting the "IN" as a JOIN -- Correlated Subqueries -- NOT IN Operator -- EXISTS and NOT EXISTS Operators -- Subqueries for UNION ALL -- Lessons Learned