Probabilistic call-graph construction
Saved in:
| Title: | Probabilistic call-graph construction |
|---|---|
| Patent Number: | 10719,314 |
| Publication Date: | July 21, 2020 |
| Appl. No: | 16/200045 |
| Application Filed: | November 26, 2018 |
| Abstract: | Embodiments construct a precise and scalable call graph that models potentially incomplete object-oriented program code, including libraries. The call graph encodes the probabilities of call relationships in the graph, where the probabilities are based on context information from the program, and are adjusted based on client configurations. Embodiments derive topics to associate with unknown elements, as well as probabilities for those topics, from declared types of the unknown elements. Configuration information encodes sets of feature conditions that direct the weighting of the unknown element types. As embodiments propagate type tuples through the graph, the probabilities of the types for each node are recalculated based on the type/probability information for the predecessors of the node. Type/probability information joins are necessary for nodes with multiple dependencies, where the manner of the join is configurable by the client. |
| Inventors: | Oracle International Corporation (Redwood Shores, CA, US) |
| Assignees: | ORACLE INTERNATIONAL CORPORATION (Redwood Shores, CA, US) |
| Claim: | 1. A computer-executed method comprising: creating a type-propagation graph that maps call relationships based on particular computer code; wherein the type-propagation graph comprises a plurality of nodes that represent respective program variables that are referred to in the particular computer code; for each node of a set of nodes of the plurality of nodes: identifying one or more types that are associated, in the particular computer code, with a particular program variable that is represented by said each node, determining a respective probability value for each of the one or more types that are associated with the particular program variable, wherein the probability value for a particular type, of the one or more types, represents a probability that the particular program variable is of the particular type during any given execution of the particular computer code, and associating said each node with one or more type tuples, wherein each type tuple, of the one or more type tuples, includes information identifying a respective type of the one or more types and the determined probability value that was identified for the respective type; and propagating type tuples across the plurality of nodes; wherein the method is performed by one or more computing devices. |
| Claim: | 2. The method of claim 1 , wherein the particular computer code is object-oriented. |
| Claim: | 3. The method of claim 1 , wherein propagating type tuples across the plurality of nodes comprises: generating a generated type tuple for a particular node of the plurality of nodes; wherein the particular node is a child node of a first parent node and a second parent node in the plurality of nodes; wherein the first parent node is associated with a first type tuple comprising a particular type and a first probability value; wherein the second parent node is associated with a second type tuple comprising the particular type and a second probability value; wherein the generated type tuple for the particular node comprises the particular type and a third probability value that is based, at least in part, on the first probability value and the second probability value. |
| Claim: | 4. The method of claim 3 , wherein the type-propagation graph includes information identifying, for the particular node, the first parent node and the particular parent node. |
| Claim: | 5. The method of claim 3 , wherein: the first parent node is further associated with a third type tuple comprising a second type and a fourth probability value; the method further comprises: determining a fifth probability value based, at least in part, on the fourth probability value; assigning, to the particular node, a second generated type tuple that comprises the second type and the fifth probability value. |
| Claim: | 6. The method of claim 5 , further comprising: determining that the second parent node is not associated with any type tuple that comprises the second type; wherein, in response to determining that the second parent node is not associated with any type tuple that comprises the second type, said determining the fifth probability value is further based on a nil probability of the second parent node being associated with the second type. |
| Claim: | 7. The method of claim 5 , further comprising: weighting the fourth probability value based on the third type tuple satisfying a condition of a configurable parameter, to produce a weighted fourth probability value; wherein the fourth probability value, which is used, at least partially, as a basis of said determining the fifth probability value, is the weighted fourth probability value. |
| Claim: | 8. The method of claim 1 , further comprising: identifying a public function that is defined in the particular computer code; and wherein creating the type-propagation graph comprises representing, as a program variable in the type-propagation graph, a particular argument of the public function. |
| Claim: | 9. The method of claim 1 , wherein: a particular node, of the set of nodes, represents a particular unknown element in the particular computer code; the particular unknown element is associated with particular one or more types; and for the particular node, determining a respective probability value for each of the particular one or more types that are associated with the particular unknown element, comprises: identifying a declared type for the particular unknown element, identifying, in configuration information, a plurality of features associated with the declared type, and based on the plurality of features, determining probabilities for the particular one or more types associated with the particular unknown element. |
| Claim: | 10. The method of claim 9 , wherein: the particular unknown element is a particular argument of a public function in the particular computer code; the method further comprises: identifying one or more call sites, in the particular computer code, that invoke the public function; wherein the one or more call sites identify one or more call-site-identified types for the particular argument; and weighting probability values of the one or more identified types, of the one or more types associated with the particular program variable, differently than other types associated with the particular program variable. |
| Claim: | 11. One or more non-transitory computer-readable media storing one or more sequences of instructions that, when executed by one or more processors, cause: creating a type-propagation graph that maps call relationships based on particular computer code; wherein the type-propagation graph comprises a plurality of nodes that represent respective program variables that are referred to in the particular computer code; for each node of a set of nodes of the plurality of nodes: identifying one or more types that are associated, in the particular computer code, with a particular program variable that is represented by said each node, determining a respective probability value for each of the one or more types that are associated with the particular program variable, wherein the probability value for a particular type, of the one or more types, represents a probability that the particular program variable is of the particular type during any given execution of the particular computer code, and associating said each node with one or more type tuples, wherein each type tuple, of the one or more type tuples, includes information identifying a respective type of the one or more types and the determined probability value that was identified for the respective type; and propagating type tuples across the plurality of nodes. |
| Claim: | 12. The one or more non-transitory computer-readable media of claim 11 , wherein the particular computer code is object-oriented. |
| Claim: | 13. The one or more non-transitory computer-readable media of claim 11 , wherein propagating type tuples across the plurality of nodes comprises: generating a generated type tuple for a particular node of the plurality of nodes; wherein the particular node is a child node of a first parent node and a second parent node in the plurality of nodes; wherein the first parent node is associated with a first type tuple comprising a particular type and a first probability value; wherein the second parent node is associated with a second type tuple comprising the particular type and a second probability value; wherein the generated type tuple for the particular node comprises the particular type and a third probability value that is based, at least in part, on the first probability value and the second probability value. |
| Claim: | 14. The one or more non-transitory computer-readable media of claim 13 , wherein the type-propagation graph includes information identifying, for the particular node, the first parent node and the second parent node. |
| Claim: | 15. The one or more non-transitory computer-readable media of claim 13 , wherein: the first parent node is further associated with a third type tuple comprising a second type and a fourth probability value; the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: determining a fifth probability value based, at least in part, on the fourth probability value; assigning, to the particular node, a second generated type tuple that comprises the second type and the fifth probability value. |
| Claim: | 16. The one or more non-transitory computer-readable media of claim 15 , wherein the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: determining that the second parent node is not associated with any type tuple that comprises the second type; wherein, in response to determining that the second parent node is not associated with any type tuple that comprises the second type, said determining the fifth probability value is further based on a nil probability of the second parent node being associated with the second type. |
| Claim: | 17. The one or more non-transitory computer-readable media of claim 15 , wherein the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: weighting the fourth probability value based on the third type tuple satisfying a condition of a configurable parameter, to produce a weighted fourth probability value; wherein the fourth probability value, which is used, at least partially, as a basis of said determining the fifth probability value, is the weighted fourth probability value. |
| Claim: | 18. The one or more non-transitory computer-readable media of claim 11 , wherein the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: identifying a public function that is defined in the particular computer code; and wherein creating the type-propagation graph comprises representing, as a program variable in the type-propagation graph, a particular argument of the public function. |
| Claim: | 19. The one or more non-transitory computer-readable media of claim 11 , wherein: a particular node, of the set of nodes, represents a particular unknown element in the particular computer code; the particular unknown element is associated with particular one or more types; and for the particular node, determining a respective probability value for each of the particular one or more types that are associated with the particular unknown element, comprises: identifying a declared type for the particular unknown element, identifying, in configuration information, a plurality of features associated with the declared type, and based on the plurality of features, determining probabilities for the particular one or more types associated with the particular unknown element. |
| Claim: | 20. The one or more non-transitory computer-readable media of claim 19 wherein: the particular unknown element is a particular argument of a public function in the particular computer code; the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: identifying one or more call sites, in the particular computer code, that invoke the public function; wherein the one or more call sites identify one or more call-site-identified types for the particular argument; wherein the particular one or more types associated with the particular unknowon element comprise the one or more cell-site-identified types; and weighting probability values of the one or more call-site-identified types, of the particular one or more types associated with the particular unknown element, differently than other types of the particular one or more types associated with the particular unknown element. |
| Patent References Cited: | 2008/0177756 July 2008 Kosche et al. 2011/0313548 December 2011 Taylor et al. 2012/0036103 February 2012 Stupp et al. 2019/0236475 August 2019 Jagota et al. |
| Other References: | Zhu et al., “Symbolic Pointer Analysis Revisited”, PLDI'04 Jun. 9-11, 2004, Washington DC, USA, 13 pages. cited by applicant Whaley et al., “CloningBased ContextSensitive Pointer Alias Analysis Using Binary Decision Diagrams”, dated Jun. 2004, 14 pages. cited by applicant Sundaresan et al., “Practical Virtual Method Call Resolution for Java”, dated 2000, 17 pages. cited by applicant Shivers, Olin, “Control-Flow Analysis of Higher-Order Languages”, School of Computer ScienceCarnegie Mellon University, dated May 1991, 200 pages. cited by applicant Reif et al., “Call Graph Construction for Java Libraries”, ACM dated Nov. 2016, 13 pages. cited by applicant Lhotak et al., “Context-sensitive points-to analysis: is it worth it?”, Sable Technical Report No. 2005-2, dated Oct. 21, 2005, 17 pages. cited by applicant Grove et al., “Call Graph Construction in Object-Oriented Languages”, OOPSLA, 1997 Conference Proceedings, 17 pages. cited by applicant Dwyer et al., “Probabilistic Program Analysis”, Springer International Publishing dated 2015, 25 pages. cited by applicant Dean et al., “Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis”, ECOOP dated Aug. 1995, 25 pages. cited by applicant Bacon et al., “Fast Static Analysis of C++ Virtual Function Calls”, ACM Conference on Object-Oriented Programming Systems, Languages and Applications, dated Oct. 1996, 19 pages. cited by applicant |
| Assistant Examiner: | Huda, Mohammed N |
| Primary Examiner: | Zhen, Wei Y |
| Attorney, Agent or Firm: | Hickman Palermo Becker Bingham LLP |
| Accession Number: | edspgr.10719314 |
| Database: | USPTO Patent Grants |
Be the first to leave a comment!