API usage pattern mining
Uloženo v:
| Název: | API usage pattern mining |
|---|---|
| Patent Number: | 9,104,525 |
| Datum vydání: | August 11, 2015 |
| Appl. No: | 13/746622 |
| Application Filed: | January 22, 2013 |
| Abstrakt: | Techniques for mining API method usage patterns from source code are described. These techniques include parsing the source code to generate API method call sequences that include an API method. These call sequences are clustered to obtain clusters. Based on the clusters, frequent closed sequences are determined and then clustered to obtain an API usage pattern. In addition, optimal clustering parameters may also be determined. In some instances, a graphical representation is generated based on the API usage pattern in response to a query associated with the API method. |
| Inventors: | Microsoft Corporation (Redmond, WA, US) |
| Assignees: | Microsoft Technology Licensing, LLC (Redmond, WA, US) |
| Claim: | 1. A computer-implemented method for Application Programming Interface (API) method usage data mining, the method comprising: generating API call sequences associated with an API method from a code file; clustering the API call sequences to generate multiple clusters; determining multiple frequent closed sequences of the multiple clusters; and clustering the multiple frequent closed sequences to generate an API usage pattern associated with the API method. |
| Claim: | 2. The computer-implemented method of claim 1 , wherein the clustering the API call sequences comprises clustering the API call sequences based on similarities among the API call sequences. |
| Claim: | 3. The computer-implemented method of claim 1 , wherein the clustering the API call sequences comprises clustering the API call sequences using an n-gram algorithm. |
| Claim: | 4. The computer-implemented method of claim 1 , wherein the determining the multiple frequent closed sequences comprises: determining multiple frequent closed sequences of individual clusters of the multiple clusters, and merging the frequent closed sequences of the individual clusters to obtain the multiple frequent closed sequences of the multiple clusters. |
| Claim: | 5. The computer-implemented method of claim 4 , wherein the multiple frequent closed sequences of the individual clusters is determined by implementing a BI-Directional-Extension-based frequent closed sequence mining (BIDE) algorithm to the individual cluster. |
| Claim: | 6. The computer-implemented method of claim 1 , wherein the clustering the multiple frequent closed sequences to generate the API usage pattern associated with the API method comprising clustering the multiple frequent closed sequences to generate multiple API usage patterns associated with the API method. |
| Claim: | 7. The computer-implemented method of claim 6 , further comprising: selecting a coefficient for the clustering the API call sequences and selecting an additional coefficient for the clustering the multiple frequent closed sequences. |
| Claim: | 8. The computer-implemented method of claim 7 , wherein the coefficient and the additional coefficient are selected such as to: increase dissimilarities among the multiple API usage patterns, and decrease a number of the multiple API usage patterns. |
| Claim: | 9. The computer-implemented method of claim 1 , wherein the API method is a first API method, and wherein the method further comprises generating a probabilistic graph of the API usage pattern, the probabilistic graph including: a directed line that connects a second API method and a third API method that are included in the API usage pattern, and a probability indicator associated with the directed line. |
| Claim: | 10. The computer-implemented method of claim 9 , wherein the directed line indicates that the second API method is called after the third API method, and the probability indicator indicates a probability that the second API method is called after the third API method. |
| Claim: | 11. The computer-implemented method of claim 1 , further comprising: receiving a query for usage of the API method; and returning a result including a probabilistic graph and a code snippet that are associated with the API usage pattern. |
| Claim: | 12. One or more computer storage media storing computer-executable instructions that are executable by one or more processors to cause the one or more processors to perform acts comprising: receiving a query for usage of an API method; generating API call sequences based on a codebase, individual API call sequences of the API call sequences including the API method; and mining the API call sequences to generate an API usage pattern for the API method using a frequent closed sequence mining algorithm, wherein the mining the API call sequences to generate the API usage pattern for the API method comprises: clustering the API call sequences to generate first multiple clusters based on similarities among the API call sequences, identifying multiple frequent closed sequences of the first multiple clusters, and clustering the multiple frequent closed sequences to generate the API usage pattern for the API method. |
| Claim: | 13. The one or more computer storage media of claim 12 , wherein the acts further comprise returning a probabilistic graphical representation indicating the API usage pattern. |
| Claim: | 14. The one or more computer storage media of claim 12 , wherein mining the API call sequences to generate the API usage pattern for the API method further comprises: selecting a coefficient for the clustering the API call sequences to generate first multiple clusters; and selecting an additional coefficient for the clustering the multiple frequent closed sequences. |
| Claim: | 15. The one or more computer storage media of claim 14 , wherein the coefficient and the additional coefficient are selected such as to: increase dissimilarities among the multiple API usage patterns, and reduce a number of the multiple API usage patterns. |
| Claim: | 16. A system for API method usage data mining, the system comprising: one or more processors; and memory to maintain a plurality of components executable by the one or more processors, the plurality of components comprising: a parser configured to collect API call sequences of an API method from a codebase, and a miner configured to: generate first multiple clusters by clustering the API call sequences, determine multiple frequent closed sequences of the first multiple clusters, and generate second multiple clusters by clustering the multiple frequent closed sequences, individual clusters of the second multiple clusters corresponding to an API usage pattern for the API method. |
| Claim: | 17. The system of claim 16 , wherein the plurality of components further comprise: an interface configured to receive a query for usage of the API method and a result including the API usage pattern, and a result generator configured to generate a probability graph of the API usage pattern. |
| Claim: | 18. The system of claim 16 , wherein the determining the multiple frequent closed sequences of the first multiple clusters comprises: determining multiple frequent closed sequences of individual cluster of the first multiple clusters by implementing a frequent closed sequence mining algorithm to the individual cluster, and merging the multiple frequent closed sequences of individual cluster to obtain the multiple frequent closed sequences of the first multiple clusters. |
| Claim: | 19. The system of claim 16 , wherein the first multiple clusters are generated using a first coefficient, the second multiple clusters are generated using a second coefficient, and the first coefficient and the second coefficient are selected such as to: increase dissimilarities among the second multiple clusters, and decrease a number of the second multiple clusters. |
| Claim: | 20. The one or more computer storage media of claim 12 , wherein the clustering the API call sequences comprises clustering the API call sequences based on similarities among the API call sequences. |
| Patent References Cited: | 7430732 September 2008 Cwalina et al. 7945591 May 2011 Kumar et al. 8191045 May 2012 Sankaranarayanan et al. 2006/0168566 July 2006 Grimaldi 2007/0288935 December 2007 Tannenbaum et al. 2011/0029475 February 2011 Gionis et al. 2011/0295892 December 2011 Evans et al. |
| Other References: | Srivastava et al., Web Usage Mining: Discovery and Applications of Usage patterns from Web Data, Jan. 2000, Sigkdd Explorations, vol. 1, pp. 12-23. cited by examiner PCT Search Report and Written Opinion mailed Apr. 16, 2014 for PCT Application No. PCT/US2014/011750. cited by applicant Wang, et al., “Mining Succinct and High-Coverage API Usage Patters from Source Code”, Mining Software Repositories, IEEE Press, May 18, 2013, pp. 319-328. cited by applicant Xie, et al., “MAPO: Mining API Usages from Open Source Repositories”, Proceedings of the 2006 International Workshop on Mining Software Repositories, May 28, 2006, New York, NY, pp. 54-57. cited by applicant Acharya, et al., “Mining API Patterns as Partial Orders from Source Code: From Usage Scenarios to Specifications”, In Proceedings of 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Sep. 3, 2007, 10 pages. cited by applicant Ammons, et al., “Mining Specifications” In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Jan. 16, 2002, 13 pages. cited by applicant Ayres, et al., “Sequential Pattern Mining using a Bitmap Representation” In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul. 23, 2002, 7 pages. cited by applicant Buse, et al., “Synthesizing API Usage Examples”, In Proceedings of 34th International Conference on Software Engineering, Jun. 2, 2012, 11 pages. cited by applicant Chatterjee, et al., “SNIFF: A Search Engine for Java Using Free-Form Queries” In Proceedings of 12th International Conference on Fundamental Approaches to Software Engineering, Mar. 22, 2009, 16 pages. cited by applicant Cheng, et al., “Identifying Bug Signatures Using Discriminative Graph Mining” In Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis, Jul. 19, 2009, 11 pages. cited by applicant Eisenberg, et al., “Using Association Metrics to Help Users Navigate API Documentation”, In Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, Sep. 21, 2010, 8 pages. cited by applicant Lo, et al., “Classification of Software Behaviors for Failure Detection: A Discriminative Pattern Mining Approach” In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jun. 28, 2009, 9 pages. cited by applicant Lo, et al., “Efficient Mining of Iterative Patterns for Software Specification Discovery” In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 12, 2007, 10 pages. cited by applicant Lo, et al., “Efficient Mining of Recurrent Rules from a Sequence Database” In Proceedings of the 13th International Conference on Database Systems for Advanced Applications, Mar. 19, 2008, 17 pages. cited by applicant Mandelin, et al., “Jungloid Mining: Helping to Navigate the API Jungle”, In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 12, 2005, 14 pages. cited by applicant McMillan, et al., “Portfolio: Finding Relevant Functions and Their Usages” In Proceedings of the 33rd International Conference on Software Engineering, May 21, 2011, 10 pages. cited by applicant Nguyen, et al., “Graph-based Mining of Multiple Object Usage Patterns”, In Proceedings of 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Aug. 24, 2009, 10 pages. cited by applicant Thummalapenta, et al., “PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web” In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, Nov. 4, 2007, 10 pages. cited by applicant Uddin, et al., “Temporal Analysis of API Usage Concepts”, In Proceedings of 34th International Conference Software Engineering, Jun. 2, 2012, 11 pages. cited by applicant Wang, et al., “An Empirical Study on the Characteristics of API Usage Patterns”, Microsoft Research Asia, Published on: Aug. 17, 2012, Available at: http://research.microsoft.com/en-us/projects/up-miner/supportingmaterials—up-miner—paper.pdf, 8 pgs. cited by applicant Wang, et al., “BIDE: Efficient Mining of Frequent Closed Sequences” In Proceedings of the 20th International Conference on Data Engineering, Mar. 30, 2004, 12 pages. cited by applicant Xie, et al., “Improving Software Reliability and Productivity via Mining Program Source Code”, In Proceedings of IEEE International Symposium on Parallel and Distributed Processing, Apr. 14, 2008, 5 pages. cited by applicant Zhong, et al., “MAPO: Mining and Recommending API Usage Patterns” In Proceedings of the 23rd European Conference on Object-Oriented Programming, Jul. 6, 2009, 26 pages. cited by applicant International Preliminary Examining Authority and Preliminary Report on Patentability for PCT Application No. PCT/US2014/011750, mailed Apr. 7, 2015, 14 pages. cited by applicant |
| Primary Examiner: | Chavis, John |
| Attorney, Agent or Firm: | Wisdom, Gregg R. Yee, Judy Minhas, Micky |
| Přístupové číslo: | edspgr.09104525 |
| Databáze: | USPTO Patent Grants |
| Abstrakt: | Techniques for mining API method usage patterns from source code are described. These techniques include parsing the source code to generate API method call sequences that include an API method. These call sequences are clustered to obtain clusters. Based on the clusters, frequent closed sequences are determined and then clustered to obtain an API usage pattern. In addition, optimal clustering parameters may also be determined. In some instances, a graphical representation is generated based on the API usage pattern in response to a query associated with the API method. |
|---|