SQL-based graph analytics and machine learning

Uloženo v:
Podrobná bibliografie
Název: SQL-based graph analytics and machine learning
Přispěvatelé: Zhao, Kangfei (author.), Yu, Jeffrey Xu (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management. (degree granting institution.)
Rok vydání: 2019
Sbírka: The Chinese University of Hong Kong: CUHK Digital Repository / 香港中文大學數碼典藏
Témata: Relational databases, SQL (Computer program language), Machine learning, QA76.73.S67 Z64 2019eb
Popis: Ph.D. ; To support analytics on massive graphs such as online social networks, Knowledge Graph, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this thesis, firstly, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. ; Along the direction shifting from "semiring + while" to "relational algebra + while" via the enhanced recursive SQL queries, another issue is how to process such enhanced recursive SQL queries based on the Gather-Apply-Scatter model under which efficient graph processing systems can be developed. To demonstrate the efficiency, ...
Druh dokumentu: text
Popis souboru: electronic resource; remote; 1 online resource (xiii, 115 leaves) : illustrations (some color); computer; online resource
Jazyk: English
Chinese
Relation: cuhk:2327359; local: ETD920200999; local: AAI27662507; local: 991039842694503407
Dostupnost: https://julac.hosted.exlibrisgroup.com/primo-explore/search?query=addsrcrid,exact,991039842694503407,AND&tab=default_tab&search_scope=All&vid=CUHK&mode=advanced&lang=en_US
https://repository.lib.cuhk.edu.hk/en/item/cuhk-2327359
Rights: Use of this resource is governed by the terms and conditions of the Creative Commons "Attribution-NonCommercial-NoDerivatives 4.0 International" License (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Přístupové číslo: edsbas.B896BC9A
Databáze: BASE
Popis
Abstrakt:Ph.D. ; To support analytics on massive graphs such as online social networks, Knowledge Graph, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. In this thesis, firstly, we focus on RDBMS, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, in this work, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and enhance the recursive with clause in SQL'99. ; Along the direction shifting from "semiring + while" to "relational algebra + while" via the enhanced recursive SQL queries, another issue is how to process such enhanced recursive SQL queries based on the Gather-Apply-Scatter model under which efficient graph processing systems can be developed. To demonstrate the efficiency, ...