Progress meters in parallel computing

Uloženo v:
Podrobná bibliografie
Název: Progress meters in parallel computing
Patent Number: 9,477,533
Datum vydání: October 25, 2016
Appl. No: 14/583254
Application Filed: December 26, 2014
Abstrakt: Systems and methods may provide a set of cores capable of parallel execution of threads. Each of the cores may run code that is provided with a progress meter that calculates the amount of work remaining to be performed on threads as they run on their respective cores. The data may be collected continuously, and may be used to alter the frequency, speed or other operating characteristic of the cores as well as groups of cores. The progress meters may be annotated into existing code.
Inventors: Intel Corporation (Santa Clara, CA, US)
Assignees: Intel Corporation (Santa Clara, CA, US)
Claim: 1. A method of controlling a computational resource, comprising: globally synchronizing a plurality of tasks across a plurality of computational resources; computing an amount of work to complete at least one task of the plurality of tasks; processing the plurality of tasks in parallel to accomplish work corresponding to each task of the plurality of tasks; repeatedly computing a work fraction that corresponds to one or more of a fraction of work completed or work remaining to be completed with respect to the amount of work to complete the at least one task of the plurality of tasks; calculating a skew of a plurality of work fractions taken from the plurality of computational resources, wherein the skew is a variance of the work fractions divided by a mean of the work fractions; and modifying a characteristic of at least one computational resource of the plurality of computational resources based on the work fraction and the skew.
Claim: 2. The method of claim 1 , wherein the plurality of computational resources includes a plurality of cores, and wherein a frequency of at least one core of the plurality of cores is varied based on the work fraction.
Claim: 3. The method of claim 1 , wherein the plurality of computational resources includes one or more of a core, a processor, a multi-core processor, a node, a cabinet, a cluster, a row, or a grid, and wherein at least a portion of the plurality of computational resources are in communication with one another.
Claim: 4. The method of claim 1 , wherein the plurality of tasks includes a plurality of threads, and wherein the plurality of computational resources includes a plurality of cores.
Claim: 5. The method of claim 1 , further including: reporting the work fraction by one or more of an application or an Application Programing Interface (API); and receiving an indication of the work fraction at a runtime monitor.
Claim: 6. The method of claim 1 , further including modifying one or more of a number, a distribution, a speed, or a frequency of at least one of the plurality of computational resources.
Claim: 7. The method of claim 1 , wherein the characteristic includes a speed, and wherein the speed of at least one computational resource of the plurality of computational resources is modified by changing an amount of electrical power provided to the at least one computation resource.
Claim: 8. The method of claim 1 , wherein the plurality of computational resources incudes a plurality of nodes.
Claim: 9. The method of claim 1 , further including synchronizing the plurality of tasks at a barrier, wherein each task of the plurality of tasks includes a waiting time at the barrier, and wherein the method further includes repeatedly modifying the characteristic to reduce the waiting time for the at least one task.
Claim: 10. An apparatus to process tasks, comprising: a plurality of computational resources to process a plurality of tasks in parallel, wherein the plurality of tasks are to be globally synchronized across the plurality of computational resources; progress meter logic, implemented at least partly in fixed functionality hardware, to: compute an amount of work to complete at least one task of the plurality of tasks; and repeatedly compute a work fraction that is to correspond to one or more of a fraction of work completed or work remaining to be completed with respect to the amount of work to complete the at least one task; skew calculator logic to compute a skew of a plurality of work fractions taken from the plurality of computational resources, wherein the skew is a variance of the work fractions divided by a mean of the work fractions; and performance balancer logic, implemented at least partly in fixed functionality hardware, to modify a characteristic of at least one computational resource of the plurality of computational resources based on the work fraction and the skew.
Claim: 11. The apparatus of claim 10 , wherein the plurality of computational resources is to include a plurality of cores, and wherein the performance balancer logic is to vary a frequency of at least one core of the plurality of cores based on the work fraction.
Claim: 12. The apparatus of claim 10 , wherein the performance balancer logic is to vary a speed of at least one of the plurality of computational resources by varying an amount of power supplied to the at least one of the plurality of computational resources.
Claim: 13. The apparatus of claim 10 , wherein the performance balancer logic is to vary a speed of at least two of the plurality of computational resources by steering power from a relatively faster one of the plurality of computational resources toward a relatively slower one of the plurality of computational resources.
Claim: 14. The apparatus of claim 10 , wherein the computational resources are to include a plurality of cores, and wherein the performance balancer logic is to vary a speed of at least one of the plurality of cores by varying an amount of power provided to the at least one of the plurality of cores.
Claim: 15. The apparatus of claim 10 , further including runtime monitor logic, implemented at least partly in fixed functionality hardware, to receive information from the progress meter logic that is to be indicative of the work fraction.
Claim: 16. The apparatus of claim 10 , wherein the plurality of computational resources are to include one or more of a core, a processor, a multi-core processor, a node, a cabinet, a cluster, a row, or a grid, and wherein at least a portion of the plurality of computational resources are to have a communications channel there between.
Claim: 17. The apparatus of claim 10 , wherein the plurality of computational resources incudes a plurality of nodes.
Claim: 18. The apparatus of claim 10 , wherein the performance balancer logic is to modify one or more of a number, a distribution, a speed, or a frequency of at least one of the plurality of computational resources.
Claim: 19. At least one non-transitory computer readable storage medium comprising one or more instructions that when executed on a computing device cause the computing device to: globally synchronize a plurality of tasks across a plurality of computational resources; compute an amount of work to complete at least one task of the plurality of tasks; process the plurality of tasks in parallel to accomplish work corresponding to each task of the plurality of tasks; repeatedly compute a work fraction that corresponds to one or more of a fraction of work completed or work remaining to be completed with respect to the amount of work to complete the at least one task of the plurality of tasks; calculate a skew of a plurality of work fractions taken from the plurality of computational resources, wherein the skew is a variance of the work fractions divided by a mean of the work fractions; and modify a characteristic of at least one computational resource of the plurality of computational resources based on the work fraction and the skew.
Claim: 20. The at least one non-transitory computer readable storage medium of claim 19 , wherein the plurality of computational resources is to include a plurality of cores, and wherein the instructions, when executed on a computing device, cause the computing device to modify a frequency of at least one of the plurality of cores.
Claim: 21. The at least one non-transitory computer readable storage medium of claim 19 , wherein the instructions, when executed, cause the computing device to: compute the work fraction; and receive information from the progress meter indicative of the work fraction.
Claim: 22. The at least one non-transitory computer readable storage medium of claim 19 , wherein the instructions, when executed, cause the computing device to vary a characteristic of operation of at least one computational resource of the plurality of computational resources.
Claim: 23. The at least one non-transitory computer readable storage medium of claim 19 , wherein the instructions, when executed, cause the computing device to vary an amount of power provided to at least one core of the plurality of cores.
Claim: 24. The at least one non-transitory computer readable storage medium of claim 19 , wherein the instructions, when executed, cause the computing device to allow the plurality of tasks to synchronize at a barrier.
Claim: 25. The at least one non-transitory computer readable storage medium of claim 19 , wherein each task of the plurality of tasks includes a waiting time at the barrier, and wherein the instructions, when executed, cause the computing device to repeatedly modify the characteristic to reduce a waiting time for at least one task.
Patent References Cited: 2006/0212868 September 2006 Takayama et al.
2008/0127145 May 2008 So et al.
2011/0055838 March 2011 Moyes
2013/0145379 June 2013 Faraj
2013/0283277 October 2013 Cai
2013/0346997 December 2013 Blocksome et al.
2014/0053163 February 2014 Yamauchi et al.



Other References: Thread Pools, Sun Microsystems, Inc, 2005. cited by examiner
Craeynest et al., Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores, 2013. cited by examiner
Cai et al., Meeting Points: Using Thread Criticality to Adapt Multicore Hardware to Parallel Regions, 2008. cited by examiner
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2015/065761, mailed on Mar. 24, 2016, 10 pages. cited by applicant
Assistant Examiner: Sun, Charlie
Primary Examiner: Puente, Emerson
Attorney, Agent or Firm: Jordan IP Law, LLC
Přístupové číslo: edspgr.09477533
Databáze: USPTO Patent Grants
Buďte první, kdo okomentuje tento záznam!
Nejprve se musíte přihlásit.