Bibliographic Details
| Title: |
API Usage Pattern Mining |
| Document Number: |
20140208296 |
| Publication Date: |
July 24, 2014 |
| Appl. No: |
13/746622 |
| Application Filed: |
January 22, 2013 |
| Abstract: |
Techniques for mining API method usage patterns from source code are described. These techniques include parsing the source code to generate API method call sequences that include an API method. These call sequences are clustered to obtain clusters. Based on the clusters, frequent closed sequences are determined and then clustered to obtain an API usage pattern. In addition, optimal clustering parameters may also be determined. In some instances, a graphical representation is generated based on the API usage pattern in response to a query associated with the API method. |
| Assignees: |
MICROSOFT CORPORATION (Redmond, WA, US) |
| Claim: |
1. A computer-implemented method for Application Programming Interface (API) method usage data mining, the method comprising: generating API call sequences associated with an API method from a code file; clustering the API call sequences to generate multiple clusters; determining multiple frequent closed sequences of the multiple clusters; and clustering the multiple frequent closed sequences to generate an API usage pattern associated with the API method. |
| Claim: |
2. The computer-implemented method of claim 1, wherein the clustering the API call sequences comprises clustering the API call sequences based on similarities among the API call sequences. |
| Claim: |
3. The computer-implemented method of claim 1, wherein the clustering the API call sequences comprises clustering the API call sequences using an n-gram algorithm. |
| Claim: |
4. The computer-implemented method of claim 1, wherein the determining the multiple frequent closed sequences comprises: determining multiple frequent closed sequences of individual clusters of the multiple clusters, and merging the frequent closed sequences of the individual clusters to obtain the multiple frequent closed sequences of the multiple clusters. |
| Claim: |
5. The computer-implemented method of claim 4, wherein the multiple frequent closed sequences of the individual clusters is determined by implementing a BI-Directional-Extension-based frequent closed sequence mining (BIDE) algorithm to the individual cluster. |
| Claim: |
6. The computer-implemented method of claim 1, wherein the clustering the multiple frequent closed sequences to generate the API usage pattern associated with the API method comprising clustering the multiple frequent closed sequences to generate multiple API usage patterns associated with the API method. |
| Claim: |
7. The computer-implemented method of claim 6, further comprising: selecting a coefficient for the clustering the API call sequences and selecting an additional coefficient for the clustering the multiple frequent closed sequences. |
| Claim: |
8. The computer-implemented method of claim 7, wherein the coefficient and the additional coefficient are selected such as to: increase dissimilarities among the multiple API usage patterns, and decrease a number of the multiple API usage patterns. |
| Claim: |
9. The computer-implemented method of claim 1, wherein the API method is a first API method, and wherein the method further comprises generating a probabilistic graph of the API usage pattern, the probabilistic graph including: a directed line that connects a second API method and a third API method that are included in the API usage pattern, and a probability indicator associated with the directed line. |
| Claim: |
10. The computer-implemented method of claim 9, wherein the command line indicates that the second API method is called after the third API method, and the probability indicator indicates a probability that the second API method is called after the third API method. |
| Claim: |
11. The computer-implemented method of claim 1, further comprising: receiving a query for usage of the API method; and returning a result including a probabilistic graph and a code snippet that are associated with the API usage pattern. |
| Claim: |
12. One or more computer-readable media storing computer-executable instructions that are executable by one or more processors to cause the one or more processors to perform acts comprising: receiving a query for usage of an API method; generating API call sequences based on a codebase, individual API call sequences of the API call sequences including the API method; and mining the API call sequences to generate an API usage pattern for the API method using a frequent closed sequence mining algorithm. |
| Claim: |
13. The one or more computer-readable media of claim 12, wherein the acts further comprise returning a probabilistic graphical representation indicating the API usage pattern. |
| Claim: |
14. The one or more computer-readable media of claim 12, wherein the mining the API call sequences to generate the API usage pattern for the API method comprises: clustering the API call sequences to generate first multiple clusters based on similarities among the API call sequences, identifying multiple frequent closed sequences of the first multiple clusters, and clustering the multiple frequent closed sequences to generate the API usage pattern for the API method. |
| Claim: |
15. The one or more computer-readable media of claim 12, wherein mining the API call sequences to generate the API usage pattern for the API method comprises: clustering the API call sequences to generate first multiple clusters using a coefficient, identifying multiple frequent closed sequences of individual cluster of the first multiple clusters, and merging the multiple frequent closed sequences of the individual cluster to obtain multiple frequent closed sequences of first multiple clusters, and clustering the multiple frequent closed sequences to generate multiple API usage patterns for the API method using an additional coefficient. |
| Claim: |
16. The one or more computer-readable media of claim 15, wherein the coefficient and the additional coefficient are selected such as to: increase dissimilarities among the multiple API usage patterns, and reduce a number of the multiple API usage patterns. |
| Claim: |
17. A system for API method usage data mining, the system comprising: one or more processors; and memory to maintain a plurality of components executable by the one or more processors, the plurality of components comprising: a parser configured to collect API call sequences of an API method from a codebase, and a miner configured to: generate first multiple clusters by clustering the API call sequences, determine multiple frequent closed sequences of the first multiple clusters, and generate second multiple clusters by clustering the multiple frequent closed sequences, individual clusters of the second multiple clusters corresponding to an API usage pattern for the API method. |
| Claim: |
18. The system of claim 17, wherein the plurality of components further comprise: an interface configured to receive a query for usage of the API method and a result including the API usage pattern, and a result generator configured to generate a probability graph of the API usage pattern. |
| Claim: |
19. The system of claim 17, wherein the determining the multiple frequent closed sequences of the first multiple clusters comprises: determining multiple frequent closed sequences of individual cluster of the first multiple clusters by implementing a frequent closed sequence mining algorithm to the individual cluster, and merging the multiple frequent closed sequences of individual cluster to obtain the multiple frequent closed sequences of the first multiple clusters. |
| Claim: |
20. The system of claim 17, wherein the first multiple clusters are generated using a first coefficient, the second multiple clusters are generated using a second coefficient, and the first coefficient and the second coefficient are selected such as to: increase dissimilarities among the second multiple clusters, and decrease a number of the second multiple clusters. |
| Current U.S. Class: |
717/123 |
| Current International Class: |
06 |
| Accession Number: |
edspap.20140208296 |
| Database: |
USPTO Patent Applications |