Database query processing with database clients
Saved in:
| Title: | Database query processing with database clients |
|---|---|
| Patent Number: | 12105,705 |
| Publication Date: | October 01, 2024 |
| Appl. No: | 17/842230 |
| Application Filed: | June 16, 2022 |
| Abstract: | Embodiments of the present disclosure describe an approach for database query processing with database clients. According to the approach, a first set of queries are obtained from a plurality of clients in communication with a database server. A second set of queries are generated by normalizing the first set of queries. A set of access paths corresponding to the second set of queries are determined for retrieving data from at least one of the plurality of clients and the database server. Data is retrieved from at least one of the plurality of clients and the database server based on the set of access paths. |
| Inventors: | INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY, US) |
| Assignees: | INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY, US) |
| Claim: | 1. A computer-implemented method for real-time database query processing, comprising: receiving, by a client device of a group of client devices, from a server device comprising a database, an indication that the client device has been selected as a primary client device of the group of client devices; obtaining, by the client device, a first set of queries to the database from a plurality of other client devices of the group of client devices; generating, by the client device, a second set of queries by normalizing the first set of queries; determining, by the client device, a set of access paths corresponding to the second set of queries, wherein at least one access path of the set of access paths comprises at least one path for retrieving first data from the database previously retrieved and currently stored on at least one client device of the plurality of other client devices, and wherein at least one access path of the set of access paths comprises at least one path for retrieving second data currently stored in the database on the server device; determining, by the client device, based on a resource usage cost analysis, that retrieving the first data from the at least one client device of the plurality of other client devices has a lower resource usage cost than retrieving the first data from the server device; and retrieving, by the client device, based on the set of access paths, data comprising the first data from the at least one client device of the plurality of other client devices and the second data from the server device. |
| Claim: | 2. The computer-implemented method of claim 1 , wherein the first set of queries comprise at least a first query and a second query and the generating the second set of queries comprises: substituting a first database object name in the first query and a second database object name in the second query with a same name to generate a normalized query; and determining the normalized query as a query of the second set of queries. |
| Claim: | 3. The computer-implemented method of claim 1 , wherein the generating the second set of queries comprises: identifying a sub-query comprised in at least two of the first set of queries; and determining the sub-query as a query of the second set of queries. |
| Claim: | 4. The computer-implemented method of claim 1 , wherein the determining the set of access paths corresponding to the second set of queries comprises: determining, for a query in the second set of queries, a plurality of candidate access paths and a plurality of corresponding resource usage costs by using a cost model and at least one of local statistics indicating data stored in the group of client devices and remote statistics indicating data stored in the database server; and selecting, from the plurality of candidate access paths, a candidate access path as an access path corresponding to the query based on the plurality of corresponding resource usage costs. |
| Claim: | 5. The computer-implemented method of claim 4 , wherein the determining the set of access paths corresponding to the second set of queries further comprises: obtaining the remote statistics and the cost model from the database server. |
| Claim: | 6. The computer-implemented method of claim 1 , wherein the resource usage cost analysis is based on at least one of I/O resource usage cost, CPU resource usage cost, or network resource usage cost of respective access paths of the set of access paths. |
| Claim: | 7. The computer-implemented method of claim 1 , wherein the second set of queries comprise a first query, and the determining the set of access paths corresponding to the second set of queries comprises: identifying, from a plurality of historical queries with known access paths, a historical query that is the same as the first query; and obtaining, from the known access paths, a known access path corresponding to the historical query as an access path corresponding to the first query. |
| Claim: | 8. The computer-implemented method of claim 1 , further comprising: distributing, by the client device, respective data in the retrieved data to the plurality of other client devices. |
| Claim: | 9. The computer-implemented method of claim 1 , wherein the group of client devices are in secure communication for resource sharing. |
| Claim: | 10. A client device, comprising: a memory computer executable instructions; and a processor that executes at least one of the computer executable instructions to cause the client device to: receive, from a server device comprising a database, an indication that the client device has been selected as a primary client device of a group of client devices comprising the client device; obtain a first set of queries to the database from a plurality of other client devices of the group of client devices; generate a second set of queries by normalizing the first set of queries; determine a set of access paths corresponding to the second set of queries, wherein at least one access path of the set of access paths comprises at least one path for retrieving first data from the database previously retrieved and currently stored on at least one client device of the plurality of other client devices, and wherein at least one access path of the set of access paths comprises at least one path for retrieving second data currently stored in the database on the server device; determine, based on a resource usage cost analysis, that retrieving the first data from the at least one client device of the plurality of other client devices has a lower resource usage cost than retrieving the first data from the server device; and retrieving, based on the set of access paths, data comprising the first data from the at least one client device of the plurality of other client devices and the second data from the server device. |
| Claim: | 11. The system of claim 10 , wherein the first set of queries comprise at least a first query and a second query and generating the second set of queries comprises: substituting a first database object name in the first query and a second database object name in the second query with a same name to generate a normalized query; and determining the normalized query as a query of the second set of queries. |
| Claim: | 12. The system of claim 10 , wherein the generating the second set of queries comprises: identifying a sub-query that is comprised in at least two of the first set of queries; and determining the sub-query as a query of the second set of queries. |
| Claim: | 13. The system of claim 10 , wherein the determining the set of access paths corresponding to the second set of queries comprises: determining, for a query in the second set of queries, a plurality of candidate access paths and a plurality of corresponding resource usage costs by using a cost model and at least one of local statistics indicating data stored in the group of client devices and remote statistics indicating data stored in the database server; and selecting, from the plurality of candidate access paths, a candidate access path as an access path corresponding to the query based on the plurality of corresponding resource usage costs. |
| Claim: | 14. The system of claim 13 , wherein the determining the set of access paths corresponding to the second set of queries further comprises: obtaining the remote statistics and the cost model from the database server. |
| Claim: | 15. The system of claim 10 , wherein the resource usage cost analysis is based on at least one of I/O resource usage cost, CPU resource usage cost, or network resource usage cost of respective access paths of the set of access paths. |
| Claim: | 16. The system of claim 10 , wherein the second set of queries comprise a first query, and the determining the set of access paths corresponding to the second set of queries comprises: identifying, from a plurality of historical queries with known access paths, a historical query that is the same as the first query; and obtaining, from the known access paths, a known access path corresponding to the historical query as an access path corresponding to the first query. |
| Claim: | 17. The system of claim 10 , wherein the computer executable instructions further cause the client device to: distributing respective data in the retrieved data to the plurality of other client devices. |
| Claim: | 18. The system of claim 10 , wherein the group of client devices are in secure communication for resource sharing. |
| Claim: | 19. A non-transitory computer-readable medium having instructions stored thereon that, in response to execution, cause a client device comprising a processor to perform operations comprising: receiving, from a server device comprising a database, an indication that the client device has been selected as a primary client device of a group of client devices comprising the client device; obtaining a first set of queries to the database from a plurality of other client devices of the group of client devices; generating a second set of queries by normalizing the first set of queries, wherein at least one access path of the set of access paths comprises at least one path for retrieving first data from the database previously retrieved and currently stored on at least one client device of the plurality of other client devices, and wherein at least one access path of the set of access paths comprises at least one path for retrieving second data currently stored in the database on the server device; determining a set of access paths corresponding to the second set of queries, wherein at least one access path of the set of access paths comprises at least one path for retrieving first data from the database previously retrieved and currently stored on at least one client device of the plurality of other client devices, and wherein at least one access path of the set of access paths comprises at least one path for retrieving second data currently stored in the database on the server device; determining, based on a resource usage cost analysis, that retrieving the first data from the at least one client device of the plurality of other client devices has a lower resource usage cost than retrieving the first data from the server device; and retrieving, based on the set of access paths, data comprising the first data from the at least one client device of the plurality of other client devices and the second data from the server device. |
| Claim: | 20. The non-transitory computer-readable medium of claim 19 , wherein the resource usage cost analysis is based on at least one of I/O resource usage cost, CPU resource usage cost, or network resource usage cost of respective access paths of the set of access paths. |
| Patent References Cited: | 9244971 January 2016 Kalki 9436735 September 2016 Feng 9569496 February 2017 Li 10949197 March 2021 Wang 11016688 May 2021 Gray 20040019587 January 2004 Fuh 20070219973 September 2007 Cui 20120317293 December 2012 Gu 20140012988 January 2014 Kruempelmann 20160103914 April 2016 Im 20160292226 October 2016 Konik 20170017686 January 2017 Feng 20170075957 March 2017 Li 20200349161 November 2020 Siddiqui 102135988 July 2011 106446134 July 2019 110321364 October 2019 111090672 May 2020 |
| Other References: | International Search Report and Written Opinion, International Application No. PCT/CN023/098626, International Filing Date Jun. 6, 2023. cited by applicant “Pushdown computations in PolyBase”, Microsoft Docs, Oct. 19, 2021, 9 pages, <https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-pushdown-computation?view=sql-server-ver15>. cited by applicant Ding et al., “CIAO: An Optimization Framework for Client-Assisted Data Loading”, arXiv:2102.11793v1 [cs.DB], Feb. 23, 2021, 12 pages. cited by applicant Mell et al., “The NIST Definition of Cloud Computing”, Recommendations of the National Institute of Standards and Technology, Special Publication 800-145, Sep. 2011, 7 pages. cited by applicant |
| Assistant Examiner: | Bartlett, William P |
| Primary Examiner: | Allen, Brittany N |
| Attorney, Agent or Firm: | Amin, Turocy & Watson, LLP |
| Accession Number: | edspgr.12105705 |
| Database: | USPTO Patent Grants |
| Abstract: | Embodiments of the present disclosure describe an approach for database query processing with database clients. According to the approach, a first set of queries are obtained from a plurality of clients in communication with a database server. A second set of queries are generated by normalizing the first set of queries. A set of access paths corresponding to the second set of queries are determined for retrieving data from at least one of the plurality of clients and the database server. Data is retrieved from at least one of the plurality of clients and the database server based on the set of access paths. |
|---|