Thread serialization, distributed parallel programming, and runtime extensions of parallel computing platform

Saved in:
Bibliographic Details
Title: Thread serialization, distributed parallel programming, and runtime extensions of parallel computing platform
Patent Number: 11257,180
Publication Date: February 22, 2022
Appl. No: 16/929790
Application Filed: July 15, 2020
Abstract: Systems, apparatuses, and methods may provide for technology to process graphical data, and to modify a runtime environment in a parallel computing platform for a graphic environment.
Inventors: Intel Corporation (Santa Clara, CA, US)
Assignees: Intel Corporation (Santa Clara, CA, US)
Claim: 1. A semiconductor package apparatus, comprising: a substrate; and a host graphics processor operatively coupled to the substrate, the host graphics processor including logic that is at least partially implemented in one or more of configurable logic or fixed-functionality hardware logic, the logic to: detect, in a runtime environment, a graphic application for use in a parallel computing platform by detecting active execution lanes of the graphic application, wherein detecting active execution lanes comprises utilizing a predetermined value; and modify, during runtime and in response to detecting the graphic application, source code in connection with the parallel computing platform by adding at least one source code extension that is to execute a loop body once for each unique value that different threads in a warp hold as the predetermined value.
Claim: 2. The semiconductor package apparatus of claim 1 , wherein: for each loop iteration, execution lanes whose value matches the predetermined value are active; and for each loop iteration, execution lanes whose value does not match the predetermined value are inactive.
Claim: 3. The semiconductor package apparatus of claim 2 , wherein modifying the source code comprises executing a loop body once for each detected active execution lane.
Claim: 4. The semiconductor package apparatus of claim 1 , wherein the at least one source code extension includes an iteration construct that is to iterate over elements of a varying predetermined variable, and process subsets of the elements that share a same value.
Claim: 5. The semiconductor package apparatus of claim 1 , wherein the at least one source code extension includes an iteration construct that is to perform reduction across all lanes in the warp by adding values of a given value in each detected active execution lane.
Claim: 6. A graphics processing system, comprising: a memory; and a semiconductor package apparatus operatively coupled to the memory, the semiconductor package apparatus including a substrate and a host graphics processor operatively coupled to the substrate, wherein the host graphics processor includes logic to: detect, in a runtime environment, a graphic application for use in a parallel computing platform by detecting active execution lanes of the graphic application, wherein detecting active execution lanes comprises utilizing a predetermined value; and modify, during runtime and in response to detecting the graphic application, source code in connection with the parallel computing platform by adding at least one source code extension that is to execute a loop body once for each unique value that different threads in a warp hold as the predetermined value.
Claim: 7. The graphics processing system of claim 6 , wherein: for each loop iteration, execution lanes whose value matches the predetermined value are active; and for each loop iteration, execution lanes whose value does not match the predetermined value are inactive.
Claim: 8. The graphics processing system of claim 7 , wherein modifying the source code comprises executing a loop body once for each detected active execution lane.
Claim: 9. At least one non-transitory computer readable medium, comprising a set of instructions, which when executed by a host graphics processor, cause the host graphics processor to: detect, in a runtime environment, a graphic application for use in a parallel computing platform by detecting active execution lanes of the graphic application, wherein detecting active execution lanes comprises utilizing a predetermined value; and modify, during runtime and in response to detecting the graphic application, source code in connection with the parallel computing platform by adding at least one source code extension that is to execute a loop body once for each unique value that different threads in a warp hold as the predetermined value.
Claim: 10. The at least one non-transitory computer readable medium of claim 9 , wherein: for each loop iteration, execution lanes whose value matches the predetermined value are active; and for each loop iteration, execution lanes whose value does not match the predetermined value are inactive.
Claim: 11. The at least one non-transitory computer readable medium of claim 10 , wherein modifying the source code comprises executing a loop body once for each detected active execution lane.
Patent References Cited: 7475397 January 2009 Garthwaite et al.
7586493 September 2009 Sams
7650602 January 2010 Amamiya et al.
7969444 June 2011 Biermann et al.
8527988 September 2013 Rhine
9058678 June 2015 Le Grand
9129443 September 2015 Gruen et al.
9165399 October 2015 Uralsky et al.
9177413 November 2015 Tatarinov et al.
9241146 January 2016 Neill
9262797 February 2016 Minkin et al.
9342857 May 2016 Kubisch et al.
9355483 May 2016 Lum et al.
9437040 September 2016 Lum et al.
9766938 September 2017 Munsh et al.
10599404 March 2020 Neto
2006/0020701 January 2006 Parekh et al.
2006/0250387 November 2006 Alcorn et al.
2009/0006520 January 2009 Abib et al.
2009/0135180 May 2009 Li
2009/0328017 December 2009 Larsen et al.
2010/0169895 July 2010 Dice et al.
2010/0211371 August 2010 Kim et al.
2011/0035736 February 2011 Stefansson et al.
2011/0037528 February 2011 Guo
2011/0087815 April 2011 Kruglick
2011/0227934 September 2011 Sharp
2011/0283086 November 2011 Mejdrich et al.
2012/0075319 June 2012 Dally
2012/0192198 July 2012 Becchi et al.
2012/0306877 December 2012 Rosasco
2013/0038618 February 2013 Urbach
2013/0055072 February 2013 Arnold et al.
2013/0113803 May 2013 Bakedash et al.
2014/0118351 May 2014 Uralsky et al.
2014/0125650 May 2014 Neill
2014/0168035 June 2014 Luebke et al.
2014/0168242 June 2014 Kubisch et al.
2014/0168783 June 2014 Luebke et al.
2014/0218390 August 2014 Rouet et al.
2014/0253555 September 2014 Lum et al.
2014/0267238 September 2014 Lum et al.
2014/0267315 September 2014 Minkin et al.
2014/0292771 October 2014 Kubisch et al.
2014/0327690 November 2014 Mcguire et al.
2014/0347359 November 2014 Gruen et al.
2014/0354675 December 2014 Lottes
2015/0002508 January 2015 Tatarinov et al.
2015/0009306 January 2015 Moore
2015/0022537 January 2015 Lum et al.
2015/0049104 February 2015 Lum et al.
2015/0130915 May 2015 More et al.
2015/0138065 May 2015 Alfierri
2015/0138228 May 2015 Lum et al.
2015/0170408 June 2015 He et al.
2015/0170409 June 2015 He et al.
2015/0187129 July 2015 Sloan
2015/0194128 July 2015 Hicok
2015/0199787 July 2015 Pechanec et al.
2015/0242210 August 2015 Kim
2015/0264299 September 2015 Leech et al.
2015/0301826 October 2015 Sideris
2015/0317827 November 2015 Crassin et al.
2016/0048999 February 2016 Patney et al.
2016/0049000 February 2016 Patney et al.
2016/0071242 March 2016 Uralsky et al.
2016/0071246 March 2016 Uralsky et al.
2016/0292810 October 2016 Fine et al.
2017/0024316 January 2017 Park et al.
2017/0024924 January 2017 Wald et al.
2017/0032488 February 2017 Nystad
2017/0168546 June 2017 Meswani
2017/0256018 September 2017 Gandhi et al.
2017/0277460 September 2017 Li et al.
2018/0165131 June 2018 O'Hare et al.









Other References: Xu et al., “PATS: Pattern Aware Scheduling and Power Gating for GPGPUs”, 2014, pp. 225-236 (Year: 2014). cited by examiner
Nicholas Wilt, “The CUDA Handbook: A Comprehensive Guide to GPU Programming”, 522 pages, Jun. 2013, Addison-Wesley, USA. cited by applicant
Shane Cook, “CUDA Programming: A Developer's Guide to Parallel Computing with GPUs”, 591 pages, 2013, Elsevier, USA. cited by applicant
European Search Report for European Patent Application No. 18167860.8, dated Sep. 12, 2018, 8 pages. cited by applicant
Restriction Requirement for U.S. Appl. No. 15/488,842, dated Jun. 1, 2018, 6 pages. cited by applicant
Office Action for U.S. Appl. No. 15/488,842, dated Sep. 14, 2018, 17 pages. cited by applicant
Office Action for U.S. Appl. No. 15/488,842, dated Jan. 25, 2019, 17 pages. cited by applicant
Advisory Action for U.S. Appl. No. 15/488,842, dated Jun. 13, 2019, 3 pages. cited by applicant
Office Action for U.S. Appl. No. 15/488,842, dated Nov. 6, 2019, 9 pages. cited by applicant
Notice of Allowance for U.S. Appl. No. 15/488,842, dated Mar. 6, 2020, 12 pages. cited by applicant
Primary Examiner: Nguyen, Phong X
Attorney, Agent or Firm: Jordan IP Law, LLC
Accession Number: edspgr.11257180
Database: USPTO Patent Grants
Be the first to leave a comment!
You must be logged in first