Principles of data integration

Principles of Data Integration is the first comprehensive textbook of data integration, covering theoretical principles and implementation issues as well as current challenges raised by the semantic web and cloud computing. The book offers a range of data integration solutions enabling you to focus...

Celý popis

Uložené v:
Podrobná bibliografia
Hlavní autori: Doan, AnHai, Halevy, Alon, Ives, Zachary G.
Médium: E-kniha Kniha
Jazyk:English
Vydavateľské údaje: Waltham, MA Morgan Kaufmann 2012
Elsevier Science & Technology
Vydanie:1
Predmet:
ISBN:9780124160446, 0124160441
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Obsah:
  • 11.4 Query Processing for XML
  • Stalker Extraction Rules -- Learning Stalker Wrappers -- Discussion -- 9.4 Wrapper Learning without Schema -- 9.4.1 Modeling Schema TS and Program EW -- 9.4.2 Inferring Schema TS and Program EW -- Tokenizing the Target Page -- Generalizing Program EW to Match the Target Page -- Resolving an Optional Mismatch -- Resolving an Iterator Mismatch -- Reducing Runtime Complexity -- 9.5 Interactive Wrapper Construction -- 9.5.1 Interactive Labeling of Pages with Stalker -- 9.5.2 Identifying Correct Extraction Results with Poly -- 9.5.3 Creating Extraction Rules with Lixto -- Creating the Extraction Rules Visually -- Representing the Extraction Rules -- Bibliographic Notes -- 10 Data Warehousing and Caching -- 10.1 Data Warehousing -- Master Data Management -- 10.1.1 Data Warehouse Design -- 10.1.2 ETL: Extract/Transform/Load -- 10.2 Data Exchange: Declarative Warehousing -- 10.2.1 Data-Exchange Settings -- 10.2.2 Data-Exchange Solutions -- 10.2.3 Universal Solutions -- 10.2.4 Core Universal Solutions -- 10.2.5 Querying the Materialized Repository -- 10.3 Caching and Partial Materialization -- 10.4 Direct Analysis of Local, External Data -- Bibliographic Notes -- II Integration with Extended Data Representations -- 11 XML -- 11.1 Data Model -- 11.2 XML Structural and Schema Definitions -- 11.2.1 Document Type Definitions (DTDs) -- 11.2.2 XML Schema (XSD) -- 11.3 Query Language -- 11.3.1 Precursors: DOM and SAX -- 11.3.2 XPath: A Primitive for XML Querying -- 11.3.3 XQuery: Query Capabilities for XML -- Basic XQuery Structure: "FLWOR" Expressions -- for: Iteration and Binding Over Collections -- let: Assignment of Collections to Variables -- where: Evaluation of Conditions Against Bindings -- return: Output of XML Trees -- order by: Changing the Order of Returned Output -- Aggregation and Uniqueness -- Data to and from Metadata -- Functions
  • 7.7 Scaling Up Data Matching -- 7.7.1 Scaling Up Rule-Based Matching -- 7.7.2 Scaling Up Other Matching Methods -- Bibliographic Notes -- 8 Query Processing -- 8.1 Background: DBMS Query Processing -- 8.1.1 Choosing a Query Execution Plan -- Enumeration ("Search") -- Interesting Orders -- Cost and Cardinality Estimation -- 8.1.2 Executing a Query Plan -- Granularity of Processing -- Control Flow -- 8.2 Background: Distributed Query Processing -- 8.2.1 Data Placement and Shipment -- 8.2.2 Joining in Two Phases -- 8.3 Query Processing for Data Integration -- 8.4 Generating Initial Query Plans -- 8.5 Query Execution for Internet Data -- 8.5.1 Multithreaded, Pipelined, Dataflow Architecture -- 8.5.2 Interfacing with Autonomous Sources -- 8.5.3 Handling Failure -- 8.6 Overview of Adaptive Query Processing -- 8.7 Event-Driven Adaptivity -- 8.7.1 Handling Source Failures and Delays -- Finding Alternative Sources -- Handling Network Delays with Rescheduling -- 8.7.2 Handling Unexpected Cardinalities at Pipeline End -- Information-Gathering Query Operators -- Predetermined Reoptimization Thresholds -- Runtime Reinvocation of the Optimizer -- 8.8 Performance-Driven Adaptivity -- 8.8.1 Eddies: Queueing-Based Plan Selection -- Basic Eddies: "Lottery Scheduling" Routing -- Extended Eddies: State Modules -- Eddies That Migrate State: STAIRs -- 8.8.2 Corrective Query Processing: Cost-Based Reoptimization -- Cost reestimation -- Reoptimization -- Creating Stitch-Up Plans -- Bibliographic Notes -- 9 Wrappers -- 9.1 Introduction -- 9.1.1 The Wrapper Construction Problem -- 9.1.2 Challenges of Wrapper Construction -- 9.1.3 Categories of Solutions -- 9.2 Manual Wrapper Construction -- 9.3 Learning-Based Wrapper Construction -- 9.3.1 HLRT Wrappers -- Learning HLRT Wrappers -- 9.3.2 Stalker Wrappers -- Nested Tuple Schemas -- The Stalker Wrapper Model
  • 3.1 Overview and Desiderata -- 3.2 Schema Mapping Languages -- 3.2.1 Principles of Schema Mapping Languages -- 3.2.2 Global-as-View -- Syntax and Semantics -- Reformulation in GAV -- Discussion -- 3.2.3 Local-as-View -- Syntax and Semantics -- Reformulation in LAV -- Discussion -- 3.2.4 Global-and-Local-as-View -- Syntax and Semantics -- Reformulation in GLAV -- 3.2.5 Tuple-Generating Dependencies -- Syntax and Semantics -- 3.3 Access-Pattern Limitations -- 3.3.1 Modeling Access-Pattern Limitations -- 3.3.2 Generating Executable Plans -- 3.4 Integrity Constraints on the Mediated Schema -- 3.4.1 LAV with Integrity Constraints -- 3.4.2 GAV with Integrity Constraints -- 3.5 Answer Completeness -- 3.5.1 Local Completeness -- 3.5.2 Detecting Answer Completeness -- 3.6 Data-Level Heterogeneity -- 3.6.1 Differences of Scale -- 3.6.2 Multiple References to the Same Entity -- Bibliographic Notes -- 4 String Matching -- 4.1 Problem Description -- 4.2 Similarity Measures -- 4.2.1 Sequence-Based Similarity Measures -- Edit Distance -- The Needleman-Wunch Measure -- The Affine Gap Measure -- The Smith-Waterman Measure -- The Jaro Measure -- The Jaro-Winkler Measure -- 4.2.2 Set-Based Similarity Measures -- The Overlap Measure -- The Jaccard Measure -- The TF/IDF Measure -- 4.2.3 Hybrid Similarity Measures -- The Generalized Jaccard Measure -- The Soft TF/IDF Similarity Measure -- The Monge-Elkan Similarity Measure -- 4.2.4 Phonetic Similarity Measures -- 4.3 Scaling Up String Matching -- 4.3.1 Inverted Index Over Strings -- 4.3.2 Size Filtering -- 4.3.3 Prefix Filtering -- Selecting the Subset Intelligently -- Applying Prefix Filtering to the Jaccard Measure -- 4.3.4 Position Filtering -- 4.3.5 Bound Filtering -- 4.3.6 Extending Scaling Techniques to Other Similarity Measures -- Bibliographic Notes -- 5 Schema Matching and Mapping -- 5.1 Problem Definition
  • 5.1.1 Semantic Mappings -- 5.1.2 Semantic Matches -- 5.1.3 Schema Matching and Mapping -- 5.2 Challenges of Schema Matching and Mapping -- 5.3 Overview of Matching and Mapping Systems -- 5.3.1 Schema Matching Systems -- 5.3.2 Schema Mapping Systems -- 5.4 Matchers -- 5.4.1 Name-Based Matchers -- 5.4.2 Instance-Based Matchers -- Creating Recognizers -- Measuring the Overlap of Values -- Using Classifiers -- 5.5 Combining Match Predictions -- 5.6 Enforcing Domain Integrity Constraints -- 5.6.1 Domain Integrity Constraints -- 5.6.2 Searching the Space of Match Combinations -- Applying Constraints with A* Search -- States -- Initial State -- Goal States -- Expanding States -- Cost of Goal States -- Cost of Abstract States -- Applying Constraints with Local Propagation -- Initialization -- Iteration -- Termination -- 5.7 Match Selector -- 5.8 Reusing Previous Matches -- 5.8.1 Learning to Match -- 5.8.2 Learners -- Rule-Based Learner -- The Naive Bayes Learner -- 5.8.3 Training the Meta-Learner -- 5.9 Many-to-Many Matches -- 5.10 From Matches to Mappings -- Bibliographic Notes -- 6 General Schema Manipulation Operators -- 6.1 Model Management Operators -- 6.2 Merge -- 6.3 ModelGen -- 6.4 Invert -- 6.5 Toward Model Management Systems -- Bibliographic Notes -- 7 Data Matching -- 7.1 Problem Definition -- 7.2 Rule-Based Matching -- 7.3 Learning-Based Matching -- 7.4 Matching by Clustering -- 7.5 Probabilistic Approaches to Data Matching -- 7.5.1 Bayesian Networks -- Representing and Reasoning with Bayesian Networks -- Learning Bayesian Networks -- Bayesian Networks as Generative Models -- 7.5.2 Data Matching with Naive Bayes -- 7.5.3 Modeling Feature Correlations -- 7.5.4 Matching Mentions of Entities in Text -- 7.6 Collective Matching -- 7.6.1 Collective Matching Based on Clustering -- 7.6.2 Collectively Matching Entity Mentions in Documents
  • Front cover -- Principles of Data Integration -- Copyright -- Dedication -- Table of Contents -- Preface -- 1 Introduction -- 1.1 What Is Data Integration? -- 1.2 Why Is It Hard? -- 1.2.1 Systems Reasons -- 1.2.2 Logical Reasons -- 1.2.3 Social and Administrative Reasons -- 1.2.4 Setting Expectations -- 1.3 Data Integration Architectures -- 1.3.1 Components of the Data Integration System -- 1.3.2 Example Data Integration Scenario -- Data Sources and Mediated Schema -- Query Processing -- 1.4 Outline of the Book -- Bibliographic Notes -- I Foundational Data Integration Techniques -- 2 Manipulating Query Expressions -- 2.1 Review of Database Concepts -- 2.1.1 Data Model -- 2.1.2 Integrity Constraints -- 2.1.3 Queries and Answers -- 2.1.4 Conjunctive Queries -- 2.1.5 Datalog Programs -- 2.2 Query Unfolding -- 2.3 Query Containment and Equivalence -- 2.3.1 Formal Definition -- 2.3.2 Containment of Conjunctive Queries -- 2.3.3 Unions of Conjunctive Queries -- 2.3.4 Conjunctive Queries with Interpreted Predicates -- 2.3.5 Conjunctive Queries with Negation -- 2.3.6 Bag Semantics, Grouping, and Aggregation -- 2.4 Answering Queries Using Views -- 2.4.1 Problem Definition -- 2.4.2 When Is a View Relevant to a Query? -- 2.4.3 The Possible Length of a Rewriting -- 2.4.4 The Bucket and MiniCon Algorithms -- The Bucket Algorithm -- The MiniCon Algorithm -- Step 1: Creating MCDs -- Step 2: Combining the MCDs -- Minimizing the Rewritings -- Constants in the Query and Views -- Computational Complexity -- 2.4.5 A Logical Approach: The Inverse-Rules Algorithm -- 2.4.6 Comparison of the Algorithms -- 2.4.7 View-Based Query Answering -- Certain Answers under the Open-World Assumption -- Certain Answers under the Closed-World Assumption -- Certain Answers for Queries with Interpreted Predicates -- Bibliographic Notes -- 3 Describing Data Sources