Method and system for optimizing parallel program execution based on speculation that an object written to is not shared

Uloženo v:
Podrobná bibliografie
Název: Method and system for optimizing parallel program execution based on speculation that an object written to is not shared
Patent Number: 9,766,926
Datum vydání: September 19, 2017
Appl. No: 14/170506
Application Filed: January 31, 2014
Abstrakt: A method for executing a program in parallel includes creating a program replica, which includes a write operation on and an identifier of an object and is a copy of the program, for a thread. The identifier specifies whether the object is thread-local. The method includes modifying the write operation based on a speculation that the write operation uses only thread-local objects. The write operation executes in a transaction of the thread. The method includes determining, while executing the program replica and using the identifier, that the object used by the write operation is not thread-local, de-optimizing the write operation by adding instrumentation to implement a software transactional memory (STM) system for the write operation to obtain a de-optimized write operation, and performing the de-optimized write operation on the object to obtain a result and store the result in a redo log.
Inventors: Oracle International Corporation (Redwood Shores, CA, US)
Assignees: Oracle International Corporation (Redwood Shores, CA, US)
Claim: 1. A method for parallel execution of programs, comprising: while executing a program in parallel: creating a first program replica for a first thread, wherein the first program replica comprises a first intermediate representation of at least a portion of the program, and wherein the first intermediate representation comprises: a write operation that writes to a property of a first object that is identified by a first identifier specifying that the first object is either a shared object or a thread local object; performing a first speculation that the first object is a thread-local object; performing a first optimization on the write operation based on the first speculation, wherein performing the first optimization comprises modifying the first intermediate representation to include compiled machine code that allows the write operation to directly write to the property of the first object; making a determination, using the first identifier, that the first object is either a shared object or a thread-local object; when the first object is a thread-local object, performing the write operation by writing directly to the property of first object; and when the first object is a shared object: de-optimizing the write operation, based on the determination, by undoing the first optimization to obtain a de-optimized write operation; adding instrumentation to the first intermediate representation to implement a software transactional memory (STM) system for the de-optimized write operation, wherein adding instrumentation to the intermediate representation comprises creating a lazy clone of the first object, the lazy clone having initially no properties; performing the de-optimized write operation by copying the property of the first object to the lazy clone, and writing to the property of the lazy clone to obtain a modified lazy clone; and committing the property of the modified lazy clone by copying the property of the modified lazy clone to the property of the first object.
Claim: 2. The method of claim 1 , wherein performing the determination comprises examining a shape of the first object, wherein the shape of the first object comprises a record of a plurality of properties of the first object, a record of a plurality of methods of the first object, and the first identifier.
Claim: 3. The method of claim 1 , further comprising: creating a second program replica for a second thread, wherein the second program replica comprises a second intermediate representation of at least a portion of the program, and wherein the second intermediate representation comprises: a read operation, and a second object identified by a second identifier specifying that the second object is a second thread-local object, and performing a second speculation that the read operation operates only on the second thread-local object; performing a second optimization on the read operation based on the second speculation; making a second determination that the read operation will operate on the second object; making a third determination, using the second identifier, that the second object is the second thread local object; and performing, based on the third determination, the read operation on a property of the second object.
Claim: 4. The method of claim 1 , further comprising: performing a validation, using the modified lazy clone, that the execution of the program replica was atomic and isolated; and committing the property of the modified lazy clone when the validation is successful.
Claim: 5. The method of claim 1 , wherein the first intermediate representation is one selected from a group consisting of an abstract syntax tree and bytecode.
Claim: 6. The method of claim 1 , wherein the compiled version of write operation comprises using a fixed offset to access one of a plurality of properties of the first object.
Claim: 7. A system for parallel execution of programs, the system comprising: a data repository for storing a plurality of program replicas; a plurality of threads each executing on one of a plurality of processors; a software transactional memory (STM) system; a speculative runtime engine configured to: create a first program replica of the plurality of program replicas for a first thread of the plurality of threads, wherein the first program replica comprises a first intermediate representation of at least a portion of the program, and wherein the first intermediate representation comprises: a write operation that writes to a property of a first object that is identified by a first identifier specifying that the first object is either a first shared object or a thread local object; perform a first speculation that the first object is a thread-local object; perform a first optimization on the write operation based on the first speculation, wherein performing the first optimization comprises modifying the first intermediate representation to include compiled machine code that allows the write operation to directly write to the property of the first object; make a determination, using the first identifier, that the first object is either a first shared object or a first thread-local object; when the first object is a thread-local object, perform the write operation by writing directly to the property of first object; and when the first object is a shared object: de-optimize the write operation, based on the determination, by undoing the first optimization to obtain a de-optimized write operation; add instrumentation to the first intermediate representation to implement the STM system for the de-optimized write operation, wherein adding instrumentation to the intermediate representation comprises creating a first lazy clone of the first object, the lazy clone having initially no properties; an interpreter configured to perform the de-optimized write operation by: copying the property of the first object to the lazy clone, and writing to the first property of the first lazy clone to obtain a first modified lazy clone; and commit the property of the modified lazy clone by copying the property of the modified lazy clone to the property of the first object.
Claim: 8. The system of claim 7 , wherein the speculative runtime engine is further configured to perform the determination by examining a shape of the first object, wherein the shape comprises a record of a plurality of properties of the first object, a record of a plurality of methods of the first object, and the first identifier.
Claim: 9. The system of claim 7 , wherein the speculative runtime engine is further configured to: create a second program replica for a second thread, wherein the second program replica comprises a second intermediate representation of at least a portion of the program, and wherein the second intermediate representation comprises: a read operation, and a second object identified by a second identifier specifying that the second object is a second shared object, and wherein the second program replica corresponds to a second copy of the program; perform a second speculation that the read operation operates only on a second thread-local object; perform a second optimization on the read operation based on the second speculation; make a second determination that the read operation will operate on the second object; make a third determination, using the second identifier, that the second object is the second shared object; and de-optimize the read operation, based on the third determination, by undoing the second optimization to obtain a de-optimized read operation; add instrumentation to the second intermediate representation to implement the STM system for the de-optimized read operation, wherein adding instrumentation to the intermediate representation comprises creating a second lazy clone of the second object; and wherein the STM system is further configured to: perform the de-optimized read operation by copying a second property of the second object to the second lazy clone to obtain a second modified lazy clone, and reading the second property from the second modified lazy clone.
Claim: 10. The system of claim 9 , wherein the STM system is further configured to: perform a validation, using the first modified lazy clone that the execution of the first program replica was atomic and isolated, and commit the first property of the first modified lazy clone by writing to a corresponding property of the first object when the validation is successful.
Claim: 11. The system of claim 9 , wherein the STM system is further configured to perform a validation, using the second modified lazy clone, that execution of the second program replica was atomic and isolated, wherein performing the validation comprises determining that the second property of the second modified lazy clone matches a corresponding property of the second object.
Claim: 12. The system of claim 7 , wherein the first is one selected from a group consisting of an abstract syntax tree and bytecode.
Claim: 13. The system of claim 7 , wherein the compiled version of the write operation comprises using a fixed offset to access one of a plurality of properties of the first object.
Claim: 14. A non-transitory computer readable medium comprising instructions which, when executed by a computer, cause a computer processor to: while executing a program in parallel: create a first program replica for a first thread, wherein the first program replica comprises a first intermediate representation of at least a portion of the program, and wherein the first intermediate representation comprises: a write operation that writes to a property of a first object, that is identified by a first identifier specifying that the first object is either a shared object or a thread local object; perform a first speculation that the first object is a thread-local object; perform a first optimization on the write operation based on the first speculation, wherein performing the first optimization comprises modifying the first intermediate representation to include compiled machine code that allows the write operation to directly write to the property of the first object; make a second determination, using the first identifier, that the first object is either a shared object or a thread-local object; when the first object is a thread-local object, performing the write operation by writing directly to the property of first object; and when the first object is a shared object: de-optimize the write operation, based on the determination, by undoing the first optimization to obtain a de-optimized write operation; add instrumentation to the first intermediate representation to implement a software transactional memory (STM) system for the de-optimized write operation, wherein adding instrumentation to the intermediate representation comprises creating a lazy clone of the first object, the lazy clone having initially no properties; perform the de-optimized write operation by copying the property of the first object to the lazy clone, and writing to the property of the lazy clone to obtain a modified lazy clone; and committing the property of the modified lazy clone by copying the property of the modified lazy clone to the property of the first object.
Patent References Cited: 2007/0239915 October 2007 Saha
2009/0113443 April 2009 Heller, Jr.
2010/0169870 July 2010 Dice
2010/0211931 August 2010 Levanoni
2011/0145512 June 2011 Adl-Tabatabai
2011/0145553 June 2011 Levanoni
2011/0145637 June 2011 Gray
2011/0246724 October 2011 Marathe
2012/0324472 December 2012 Rossbach











Other References: Mehrara, M., et al., “Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation Mechanism”, University of Michigan, Feb. 12-16, 2011 (pp. 87-98). cited by applicant
Dragojevic, A., et al., “Stretching Transactional Memory”, Dublin, Ireland, Jun. 15-20, 2009 (11 pages). cited by applicant
Damron, P., et al., “Hybrid Transactional Memory”, Sun Microsystems Laboratories, San Jose, California, Oct. 21-25, 2006 (11 pages). cited by applicant
Dalessandro, L., et al., “NOrec: Streamline STM by Abolishing Ownership Records”, Bangalore, India, Jan. 9-14, 2010 (pp. 67-77). cited by applicant
Zhang, M., et al., “LarkTM: Efficient, Strongy Atomic Software Transactional Memory”, Ohio State University, updated Nov. 2013 (12 pages). cited by applicant
Schneider, F., et al., “Dynamic Optimization for Efficient Strong Atomicity”, Nashville, Tennessee, Oct. 19-23, 2008 (13 pages). cited by applicant
Korland, G., et al., “Noninvasive concurrency with Java STM”, Sun Microsystems, Jan. 2010 (14 pages). cited by applicant
Harris, T., et al., “Optimizing Memory Transactions”, Ontario, Canada, Jun. 11-14, 2006 (12 pages). cited by applicant
Hindman, B., et al., “Atomicity via Source-to Source Translation”, San Jose, California, Oct. 22, 2006 (10 pages). cited by applicant
Tabatabai, A., et al., “Compiler and Runtime Support for Efficient Software Transactional Memory”, Ontario, Canada, Jun. 10-16, 2006 (pp. 26-37). cited by applicant
Bronson, N., et al., “Feedback-Directed Barrier Optimization in a Strongly Isolated STM”, Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principals of Programming Language, Jan. 2009 (13 pages). cited by applicant
Bonetta, D., et al., “TigerQuoll: Parallel Event-based JavaScript”, Shenzhen, China, Feb. 23-27, 2013 (10 pages). cited by applicant
Assistant Examiner: Ayers, Michael
Primary Examiner: An, Meng
Attorney, Agent or Firm: Osha Liang LLP
Přístupové číslo: edspgr.09766926
Databáze: USPTO Patent Grants
Buďte první, kdo okomentuje tento záznam!
Nejprve se musíte přihlásit.