Building an Effective Data Science Practice - A Framework to Bootstrap and Manage a Successful Data Science Practice

Gain a deep understanding of data science and the thought process needed to solve problems in that field using the required techniques, technologies and skills that go into forming an interdisciplinary team. This book will enable you to set up an effective team of engineers, data scientists, analyst...

Celý popis

Uloženo v:
Podrobná bibliografie
Hlavní autor: Vineet Raina, Srinath Krishnamurthy
Médium: E-kniha
Jazyk:angličtina
Vydáno: Berkeley, CA Apress, an imprint of Springer Nature 2022
Apress
Apress L. P
Vydání:1
Témata:
ISBN:9781484274187, 1484274180, 9781484274194, 1484274199
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Obsah:
  • Title Page Introduction Table of Contents Part I. Fundamentals 1. Introduction: The Data Science Process 2. Data Science and Your Business 3. Monks vs. Cowboys: Data Science Cultures Part II. Classes of Problems 4. Classification 5. Regression 6. Natural Language Processing 7. Clustering 8. Anomaly Detection 9. Recommendations 10. Computer Vision 11. Sequential Decision-Making Part III. Techniques and Technologies 12. Techniques and Technologies: An Overview 13. Data Capture 14. Data Preparation 15. Data Visualization 16. Machine Learning 17. Inference 18. Other Tools and Services 19. Reference Architecture 20. Monks vs. Cowboys: Praxis Part IV. Building Teams and Executing Projects 21. The Skills Framework 22. Building and Structuring the Team 23. Data Science Projects Index
  • References -- Part III: Techniques and Technologies -- Chapter 12: Techniques and Technologies: An Overview -- Chapter 13: Data Capture -- Data Sources (1) -- Ingestion (2) -- Data Storage -- Data Lake (3) -- Data Warehouse (4) -- Shared File Systems (5) -- Read Data (6) -- Programmatic Access -- SQL Query Engine -- Open Source vs. Paid -- Data Engineering -- Conclusion -- Chapter 14: Data Preparation -- Handling Missing Values -- Feature Scaling -- Text Preprocessing -- Stemming -- TF-IDF -- Converting Categorical Variables into Numeric Variables -- Transforming Images -- Libraries and Tools -- Libraries -- Tools -- Data Engineering -- Conclusion -- Chapter 15: Data Visualization -- Graphs/Charts/Plots -- Legends -- Layouts -- Options -- Interactive Visualizations -- Deriving Insights from Visualizations -- Histogram -- Kernel Density Estimate Plot -- Libraries and Tools -- Libraries -- Tools -- Data Engineering -- Conclusion -- Chapter 16: Machine Learning -- Categories of Machine Learning Algorithms -- Supervised Learning -- Unsupervised Learning -- Reinforcement Learning -- Popular Machine Learning Algorithms -- Linear Regression -- Logistic Regression -- Support Vector Machine -- Decision Tree -- Random Forest -- Gradient Boosted Trees -- Artificial Neural Network -- Convolutional Neural Network -- Evaluating and Tuning Models -- Evaluating Models -- Tuning models -- Cross-Validation -- Libraries and Tools -- Data Engineering -- Conclusion -- Further Reading -- References -- Chapter 17: Inference -- Model Release Process (1) -- Model Registry -- Model Converter -- Interexchange Format -- Target System -- Model Packaging -- Production -- Inference Server (2) -- Inference/Prediction Service -- Model Monitoring -- Mobile and Web Applications (3) -- ML Ops -- Open Source vs. Paid -- Data Engineering -- Conclusion
  • Intro -- Table of Contents -- About the Authors -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Part I: Fundamentals -- Chapter 1: Introduction: The Data Science Process -- What We Mean by Data Science -- The Data Science Process -- Machine Learning -- Data Capture (from the World) -- Data Preparation -- Data Visualization -- Inference -- Data Engineering -- Terminology Chaos: AI, ML, Data Science, Deep Learning, Etc. -- Conclusion -- Further Reading -- References -- Chapter 2: Data Science and Your Business -- How Data Science Fits into a Business -- Operational Optimizations -- Product Enhancements -- Strategic Insights -- Is Your Business Ready for Data Science? -- A Cautionary Tale -- In the Beginning Was the Data -- And the Data Was with… Whom Exactly? -- The Model Said "Here Am I, Send Me" -- Conclusion -- Further Reading -- References -- Chapter 3: Monks vs. Cowboys: Data Science Cultures -- The Two Cultures of Data Science -- Hybrid Cultures -- Cultural Differences -- Data Science Culture and Your Business -- The Cultural Spectrum of Data Scientists -- Theory and Experimentation in Data Science -- Data Engineering -- Conclusion -- Summary of Part 1 -- Part II: Classes of Problems -- Chapter 4: Classification -- Data Capture -- Data Preparation -- Data Visualization -- Machine Learning -- Inference -- Data Engineering -- Conclusion -- Chapter 5: Regression -- Data Capture -- Data Preparation -- Data Visualization -- Machine Learning -- Inference -- Conclusion -- Chapter 6: Natural Language Processing -- Data Capture -- Data Preparation -- Machine Learning -- Inference -- Conclusion -- Chapter 7: Clustering -- Data Capture -- Data Preparation -- Handling Missing Values -- Normalization -- Data Visualization -- Machine Learning -- Similarity of Observations -- Data Visualization Iteration -- Inference
  • Interpreting the Dendrogram -- Actionable Insights for Marketing -- Conclusion -- Further Reading -- Reference -- Chapter 8: Anomaly Detection -- Anomaly Detection Using Unlabeled Data -- Novelty Detection Using Pure Data -- Data Science Process for Anomaly Detection -- The World and Data Capture -- Data Preparation -- Data Visualization -- Box Plots -- Conditional Box Plots -- Scatter Plots -- Machine Learning -- Inference -- Anatomy of an Anomaly -- Complex Anomalies -- Collective Anomalies -- Contextual Anomalies -- Time Series -- Conclusion -- Further Reading -- References -- Chapter 9: Recommendations -- Data Capture -- Items and Interactions -- Quantifying an Interaction -- Example Data -- Data Preparation -- Normalization -- Handling Missing Values -- Data Visualization -- Machine Learning -- Clustering-Based Approach -- Inference -- End-to-End Automation -- Conclusion -- Further Reading -- References -- Chapter 10: Computer Vision -- Processing Images -- Image Classification/Regression -- Object Detection -- Datasets, Competitions, and Architectures -- Processing Videos -- Video Classification -- Object Tracking -- Data Science Process for Computer Vision -- The World and Data Capture -- Data Preparation -- Data Visualization -- Machine Learning -- Model Performance Evaluation -- Inference -- Data Engineering -- Conclusion -- Further Reading -- References -- Chapter 11: Sequential Decision-Making -- The RL Setting -- Basic Knowledge and Rules -- Training Nestor -- Episode -- Training Phases -- Past Cases -- Ongoing New Cases, with Imitation -- Supervised Exploration -- Supervised Exploitation -- Data in the RL Setting -- Data of Experts' Decisions -- Simulated Data -- Challenges in RL -- Availability of Data -- Information in Observations -- Exploration vs. Exploitation -- Data Science Process for RL -- Conclusion -- Further Reading
  • Chapter 18: Other Tools and Services -- Development Environment -- Experiment Registry -- Compute Infrastructure -- AutoML -- Purpose of AutoML -- AutoML Cautions -- Tools and Services -- Multimodal Predictive Analytics and Machine Learning -- Data Science Apps/Workflows -- Off-the-Shelf AI Services and Libraries -- When to Use -- Open Source vs. Paid -- Conclusion -- Chapter 19: Reference Architecture -- Experimentation -- Dev Environment (1) -- Data Sources (2) -- Ingestion (3) -- Core Infra (4) -- Analytics (5) -- Data Science Apps/Workflows (6) -- AutoML (7) -- From Experimentation to Production -- AI Services -- Conclusion -- Chapter 20: Monks vs. Cowboys: Praxis -- Goals of Modeling -- Estimating Truth: Simplicity of Representation -- Estimating Truth: Attribution -- Prediction: Interpretability -- Prediction: Accuracy -- Grading ML Techniques -- Cultural Differences -- Conclusion -- Summary of Part 3 -- References -- Part IV: Building Teams and Executing Projects -- Chapter 21: The Skills Framework -- The Three Dimensions of Skills -- Data Analysis Skills -- Software Engineering Skills -- Domain Expertise -- The Roles in a Data Science Team -- Citizen Data Scientist -- Data Analyst -- Data Science Technician -- ML Ops -- Data Engineer -- Data Architect -- ML Engineer -- Data Scientist -- Chief Data Scientist -- Deviations in Skills -- Conclusion -- Chapter 22: Building and Structuring the Team -- Typical Team Structures -- Small Incubation Team -- Mature Operational team -- Team Evolution -- The Key Hire: Chief Data Scientist -- Evaluating the Culture -- Hiring vs. Getting a Consultant -- Data Engineering: Requirements and Staffing -- Notes on Upskilling -- Conclusion -- Chapter 23: Data Science Projects -- Types of Data Science Projects -- Knowledge Discovery from Data/Data Mining -- Data Science Infusion in Processes
  • Data Science Infusion in Products -- Data Science-Based Product -- Typical Traits of Data Science Projects -- KPIs -- Model Performance -- Experimentation Cycle Time -- Effort-Cost Trade-Offs -- Data Quality -- Importance of Data Quality -- Issues Arising from Poor-Quality Data -- Severity of Impact -- Dimensions of Data Quality -- Measuring Data Quality -- Ensuring Data Quality -- Resistance to Data Quality Efforts -- Data Protection and Privacy -- Encryption -- Access Controls -- Identifiable/Protected/Sensitive Information -- Federated Learning -- Legal and Regulatory Aspects -- When Are These Relevant? -- Nondiscrimination -- Explainability and Accountability -- Explainable AI: What Is an "Explanation"? -- Cognitive Bias -- Cognitive Bias and Data Science Projects -- Conclusion and Further Reading -- References -- Index