DATABASE WORKLOAD ANALYSIS AND OPTIMIZATION VISUALIZATIONS
Gespeichert in:
| Titel: | DATABASE WORKLOAD ANALYSIS AND OPTIMIZATION VISUALIZATIONS |
|---|---|
| Document Number: | 20170132296 |
| Publikationsdatum: | May 11, 2017 |
| Appl. No: | 15/345375 |
| Application Filed: | November 07, 2016 |
| Abstract: | Techniques are described for analyzing usage of data stored in a data storage system without accessing the stored data. In some embodiments, workload data indicative of queries executed at the data storage system on stored data is received. This workload data can include query logs generated during execution of the queries. The workload data is processed to identify data elements such as tables, columns, and views associated with the stored data as well as information regarding usage of the identified data elements. Usage can include operations performed on the data elements during execution of the queries. Based on this processing relationships between the identified data elements can be inferred and visualizations generated that convey information regarding usage of the data stored at the data storage system. Visualizations can include, among others, usage heatmap diagrams, join diagrams, column family diagrams, filter diagrams, view lineage diagrams, data flow diagrams, denormalization diagrams, and workload distribution diagrams. |
| Claim: | 1. A method for causing display of a usage heatmap diagram indicative of usage of data stored in a data storage system, the method comprising: receiving workload data indicative of queries executed at the data storage system on the stored data; processing the workload data to identify a plurality of tables present in the stored data, each of the identified tables including one or more columns; processing the workload data to determine relationships between the identified tables based on operations performed during execution of the queries; and causing display of the usage heatmap diagram based on the processing, the usage heatmap diagram including: a plurality of table nodes representative of the identified plurality of tables, each of the plurality of table nodes visually linked to one or more of the other plurality of table nodes based on the determined relationships; wherein each of the plurality of table nodes is displayed as a graphical object in a color corresponding to relative usage of the represented table and includes a graphical listing of a plurality of column identifiers representative of a plurality of columns present in the represented table, each of the one or more column identifiers ordered based on relative usage of the represented columns and displayed in a color corresponding to a category of operation associated with usage of the represented columns. |
| Claim: | 2. A method for facilitating analysis of the usage of data stored in a data storage system without accessing the stored data, the method comprising: receiving workload data indicative of queries executed at the data storage system on the stored data; processing the workload data to identify data elements associated with the stored data and information regarding usage of the identified data elements; inferring a logical schema defining relationships between the identified data elements based on the information regarding usage of the identified data elements; and causing display of an interactive visualization based on the inferred logical schema that conveys information regarding usage of the stored data. |
| Claim: | 3. The method of claim 1, wherein the workload data includes any of a query execution log or a data definition language (DDL) statement. |
| Claim: | 4. The method of claim 1, wherein the workload data includes timestamped machine-generated data resulting from execution of the queries. |
| Claim: | 5. The method of claim 1, wherein the identified data elements include any of a table, a column, or a view. |
| Claim: | 6. The method of claim 1, wherein the interactive visualization includes a usage heatmap diagram, the usage heatmap diagram including: a plurality of table nodes representative of a plurality of tables present in the stored data, each of the plurality of table nodes visually linked to one or more of the other plurality of table nodes based on relationships between the plurality of tables, each of the plurality of table nodes including: an identifier associated with the table; and a listing of column identifiers representative of columns present in the table, each of the column identifiers including information regarding usage of the columns. |
| Claim: | 7. The method of claim 6, wherein the identifier associated with the table includes any of: a name of the table; or an indication that the table is a fact table or a dimension table. |
| Claim: | 8. The method of claim 6, wherein a particular table node representative of a particular table is visually represented as a rectangular box and wherein column identifiers representative of columns present in the particular table are visually represented as smaller rectangular boxes enclosed in the rectangular box representative of the particular table. |
| Claim: | 9. The method of claim 6, wherein a particular table node representative of a particular table is visually represented as a graphical object of a particular color, wherein the particular color is indicative of any of: level of usage of columns present in the particular table; or a category of operation associated with usage of columns present in the particular table. |
| Claim: | 10. The method of claim 6, wherein a particular column identifier representative of a particular column present in a particular table is visually represented as a graphical object of a particular color, wherein the particular color is indicative of any of: level of usage of the particular column; or a category of operation associated with usage of the particular column. |
| Claim: | 11. The method of claim 6, wherein for a particular table node representative of a particular table, the listed column identifiers representative of columns present in the particular table are ordered based on relative usage. |
| Claim: | 12. The method of claim 7, wherein a particular column identifier representative of a particular column includes any of: a name of the particular column; an indication of a category of operation associated with usage of the particular column; or statistical information regarding usage of the particular column. |
| Claim: | 13. The method of claim 6, wherein the information regarding usage of the columns includes an indication of a category of operation associated with usage of the columns, the category of usage selected from the following categories: “select,” “filter,” “group by,” or “order by.” |
| Claim: | 14. The method of claim 6, further comprising: causing display of an option to select from a plurality of categories of operation associated with usage of the stored data; receiving a user selection of a particular category of operation via the option; and dynamically updating display of the usage heatmap diagram in response to the selected particular category of operation. |
| Claim: | 15. The method of claim 6, further comprising: causing display of statistical information regarding usage of a particular column in response to detecting a user interaction with a particular column identifier in the usage heatmap diagram that is representative of the particular column. |
| Claim: | 16. The method of claim 1, wherein the interactive visualization includes a join diagram, the join diagram including: a plurality of table nodes representative of a plurality of tables present in the stored data, each of the plurality of table nodes visually linked to one or more of the other plurality of table nodes based on inferred join relationships; wherein a particular inferred join relationship between a particular two or more of the plurality of table nodes is indicative of a level of query operations joining the particular two or more tables represented by the two or more table nodes as observed in the workload data. |
| Claim: | 17. The method of claim 16, further comprising: dynamically updating the interactive visualization to highlight the particular two or more table nodes associated with a particular inferred join relationship in response to a user interaction with any of the particular two or more table nodes. |
| Claim: | 18. The method of claim 16, further comprising: dynamically updating the interactive visualization to cause display of join key and/or a category of join operation associated with a particular inferred join relationship in response to a user interaction with any of the particular two or more table nodes. |
| Claim: | 19. The method of claim 1, wherein the interactive visualization includes a column family diagram, the column family diagram including: a graphical object representative of a table present in the stored data, the graphical object including a listing of column identifiers representative of columns present in the table and related to other columns by inferred aggregation relationships; wherein a particular inferred aggregation relationship between a particular two or more columns is indicative of a level of query operations aggregating the values of the particular two or more columns as observed in the workload data. |
| Claim: | 20. The method of claim 19, wherein the query operations aggregating values of the particular two or more columns include a “group by” query operation. |
| Claim: | 21. The method of claim 1, wherein the interactive visualization includes a filter diagram, the filter diagram including: a graphical object representative of a data structure associated with the stored data; and a line bisecting the graphical object, the lines indicative of a filter operation applied to the data structure as observed in the workload data. |
| Claim: | 22. The method of claim 21, wherein the line bisecting the graphical object includes a value identifier indicative of a value associated with the filter operation. |
| Claim: | 23. The method of claim 1, wherein the interactive visualization includes a view lineage diagram, the view lineage diagram including: a view node representative of a view upon which a query was executed at the data storage system; plurality of table nodes representative of tables upon which the view depends, each of the plurality of table nodes visually linked to the view node. |
| Claim: | 24. The method of claim 1, wherein the interactive visualization includes a data flow diagram, the data flow diagram including: a plurality of table nodes representative of a plurality of tables present in the stored data; a query node representative of one or more queries executed on the plurality of tables, the query node visually linked to the plurality of table nodes; a transient table node representative of one or more transient tables generated as part of a processing pipeline during execution of one or more queries, the transient table nodes visually linked to the query node. |
| Claim: | 25. The method of claim 24, wherein the data flow diagram further includes: a second query node representative of one or more queries executed on the one or more transient tables, the second query node visually linked to the transient table node. |
| Claim: | 26. The method of claim 1, wherein the interactive visualization includes a denormalization diagram, the denormalization diagram including: a plurality of table nodes representative of a plurality of tables present in the stored data, each of the plurality of table nodes visually linked to one or more of the other plurality of table nodes based on inferred join relationships; wherein a particular inferred join relationship between a particular subset of the plurality of table nodes is indicative of a level of query operations joining the particular subset of tables represented by the two or more table nodes as observed in the workload data; and a denormalization recommendation to combine the particular subset of table nodes into fewer tables using an alternative data storage framework. |
| Claim: | 27. The method of claim 26, wherein denormalization recommendation includes an option to generate a code template for use in migrating stored data associated with the particular subset of table nodes into the alternative data storage framework. |
| Claim: | 28. The method of claim 26, wherein the alternative data storage framework is based on any of Hadoop, Impala, or Hive. |
| Claim: | 29. The method of claim 1, wherein the interactive visualization includes a workload distribution diagram, the distribution diagram including: a visual representation of a measure of the queries indicated in the workload data split according to one or more of the following categories: originating user; originating application; originating report; query complexity; or compatibility with a target alternative data storage framework. |
| Claim: | 30. The method of claim 1, the interactive visualization includes a plurality of insights displayed via an interactive accordion box interface element, wherein each interface element is collapsible and expandable in response to user interaction, and wherein one or more of the interface elements display any of a: usage heatmap diagram; join diagram; column family diagram; filter diagram; view lineage diagram; data flow diagram; denormalization diagram; or workload distribution diagram. |
| Claim: | 31. A system for facilitating analysis of the usage of data stored in a data storage system without accessing the stored data, the system comprising: a processing unit; and a memory unit coupled to the processing unit, the memory unit having instructions stored thereon, which when executed by the processing unit, cause the system to: receive workload data indicative of queries executed at the data storage system on the stored data; process the workload data to identify data elements associated with the stored data and information regarding usage of the identified data elements; infer a logical schema defining relationships between the identified data elements based on the information regarding usage of the identified data elements; and cause display of an interactive visualization based on the inferred logical schema that conveys information regarding usage of the stored data. |
| Current International Class: | 06 |
| Dokumentencode: | edspap.20170132296 |
| Datenbank: | USPTO Patent Applications |
Schreiben Sie den ersten Kommentar!