Method for executing computation, computing device, computing system, and storage medium
Uložené v:
| Názov: | Method for executing computation, computing device, computing system, and storage medium |
|---|---|
| Patent Number: | 11886,846 |
| Dátum vydania: | January 30, 2024 |
| Appl. No: | 17/686413 |
| Application Filed: | March 04, 2022 |
| Abstrakt: | A method for executing computation, a computing device, a computing system, and a storage medium are provided. The method includes: confirming, via a compiler, whether there is a call instruction related to a thread block modification request in a kernel function to be compiled; in response to confirming that there is the call instruction related to the thread block modification request in the kernel function to be compiled, determining a corresponding program segment associated with the call instruction; configuring a required thread block and thread local register for the corresponding program segment; and inserting a control instruction into the corresponding program segment to enable the thread block configured for the corresponding program segment to execute relevant computation of the corresponding program segment, and an unconfigured thread block not to execute the relevant computation. The disclosure can improve overall performance, make coding and maintenance easy and reduce error rate of code. |
| Inventors: | Shanghai Biren Technology Co., Ltd (Shanghai, CN) |
| Assignees: | Shanghai Biren Technology Co., Ltd (Shanghai, CN) |
| Claim: | 1. A method for executing computation, comprising: confirming, via a compiler, whether there is a call instruction related to a thread block modification request in a kernel function to be compiled; in response to confirming that there is the call instruction related to the thread block modification request in the kernel function to be compiled, determining a corresponding program segment associated with the call instruction; configuring a required thread resource for the corresponding program segment; and inserting a control instruction into the corresponding program segment to enable the thread resource configured for the corresponding program segment to execute relevant computation of the corresponding program segment, wherein an unconfigured thread resource does not execute the relevant computation, wherein enabling the thread resource configured for the corresponding program segment to execute the relevant computation of the corresponding program segment comprises: controlling an executive stream through a control flow code to encapsulate the corresponding program segment, so that when a thread identifier of a current thread is determined to belong to a thread block identifier set configured for the corresponding program segment, the current thread is used to execute the relevant computation of the corresponding program segment; and when the thread identifier of the current thread is determined to not belong to the thread block identifier set configured for the corresponding program segment, the current thread is skipped. |
| Claim: | 2. The method according to claim 1 , wherein determining the corresponding program segment associated with the call instruction comprises: dividing the kernel function to be compiled into a plurality of corresponding program segments based on a number of call instructions. |
| Claim: | 3. The method according to claim 2 , wherein configuring the required thread resource for the corresponding program segment comprises: configuring a required thread block and thread local register for each of the corresponding program segments. |
| Claim: | 4. The method according to claim 1 , further comprising: in response to confirming that a compiled kernel function is started, configuring a numbering of a thread block and assigning a thread local register based on a maximum number of threads required by the compiled kernel function, wherein the compiled kernel function comprises a plurality of corresponding program segments respectively inserted with the control instruction. |
| Claim: | 5. The method according to claim 1 , further comprising: in response to the control instruction of the corresponding program segment being called, performing followings: synchronizing all threads of a thread block configured for the corresponding program segment; allocating a thread local register for each of the threads of the configured thread block; and reconfiguring a numbering of each of the threads, so that the reconfigured numbering is written to the assigned thread local register. |
| Claim: | 6. The method according to claim 1 , further comprising: starting, via a driver, a compiled kernel function; confirming, via the compiler or the driver, a maximum number of threads required by the compiled kernel function; and requesting, via the driver, the maximum number of threads from a graphics processor. |
| Claim: | 7. The method according to claim 1 , wherein confirming whether there is the call instruction related to the thread block modification request in the kernel function to be compiled comprises: scanning, via the compiler, the kernel function to be compiled to confirm whether the kernel function to be compiled comprises the call instruction related to the thread block modification request, wherein the call instruction related to the thread block modification request indicates a thread configuration required by the corresponding program segment. |
| Claim: | 8. A computing device comprising: at least one processor; and a memory, communicatively connected to the at least one processor, wherein the memory stores an instruction executable by the at least one processor, the instruction is executed by the at least one processor, so that the at least one processor executes the method according to claim 1 . |
| Claim: | 9. A non-transitory computer-readable storage medium, storing a computer program, wherein when the computer program is executed by a machine, the method according to claim 1 is executed. |
| Claim: | 10. A computing system, comprising: a memory comprising a compiler module; a central processor communicatively connected to the memory and comprising a driver module; and a graphics processor communicatively connected to central processor and comprising an execution unit computation core module; wherein the central processor is configured to execute the compiler module to identify a call instruction related to a thread block modification request comprised in a received kernel function to be compiled, and to convert the kernel function to be compiled of the call instruction to generate a compiled kernel function, wherein a corresponding program segment comprised in the compiled kernel function is configured with a control instruction; wherein the central processor is configured to execute the driver module to determine a maximum number of threads required by the compiled kernel function to start based on the compiled kernel function; and wherein the graphics processor is configured to execute the execution unit computation core module to modify a function call of a thread block in response to the control instruction of the corresponding program segment in the compiled kernel function being executed. |
| Claim: | 11. The computing system according to claim 10 , wherein the compiler module is further configured to: when the kernel function to be compiled is confirmed to comprise the call instruction related to the thread block modification request, configure a required thread resource for the corresponding program segment associated with the call instruction; and insert the control instruction into the corresponding program segment to enable the thread resource configured for the corresponding program segment to execute relevant computation of the corresponding program segment, wherein an unconfigured thread resource does not execute the relevant computation. |
| Claim: | 12. The computing system according to claim 10 , wherein modifying the function call of the thread block comprises: resuming or pausing a thread, and assigning a thread local register for each configured thread. |
| Patent References Cited: | 10599404 March 2020 Neto 20140026139 January 2014 Takahara 20180314520 November 2018 Tirumala 20190377582 December 2019 Guérin 20200285473 September 2020 Foo |
| Primary Examiner: | Nahar, Qamrun |
| Attorney, Agent or Firm: | JCIP GLOBAL INC. |
| Prístupové číslo: | edspgr.11886846 |
| Databáza: | USPTO Patent Grants |
| Abstrakt: | A method for executing computation, a computing device, a computing system, and a storage medium are provided. The method includes: confirming, via a compiler, whether there is a call instruction related to a thread block modification request in a kernel function to be compiled; in response to confirming that there is the call instruction related to the thread block modification request in the kernel function to be compiled, determining a corresponding program segment associated with the call instruction; configuring a required thread block and thread local register for the corresponding program segment; and inserting a control instruction into the corresponding program segment to enable the thread block configured for the corresponding program segment to execute relevant computation of the corresponding program segment, and an unconfigured thread block not to execute the relevant computation. The disclosure can improve overall performance, make coding and maintenance easy and reduce error rate of code. |
|---|