DTexL: Decoupled Raster Pipeline for Texture Locality
Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Cac...
Gespeichert in:
| Veröffentlicht in: | 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO) S. 213 - 227 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.10.2022
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead. |
|---|---|
| AbstractList | Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead. |
| Author | Parcerisa, Joan-Manuel Gonzalez, Antonio Joseph, Diya Aragon, Juan L. |
| Author_xml | – sequence: 1 givenname: Diya surname: Joseph fullname: Joseph, Diya organization: Universitat Politècnica de Catalunya,Barcelona,Spain – sequence: 2 givenname: Juan L. surname: Aragon fullname: Aragon, Juan L. organization: Universidad de Murcia,Murcia,Spain – sequence: 3 givenname: Joan-Manuel surname: Parcerisa fullname: Parcerisa, Joan-Manuel organization: Universitat Politècnica de Catalunya,Barcelona,Spain – sequence: 4 givenname: Antonio surname: Gonzalez fullname: Gonzalez, Antonio organization: Universitat Politècnica de Catalunya,Barcelona,Spain |
| BookMark | eNotzN1Kw0AQQOEVFNSaJxBhXyBxMpvZH-8ktVqIVEq9LpvNBBZiEpIU7Nsr6NW5-Ti34rIfehbiIYcsz8E9vm_L_Y40FjZDQMwAAO2FSJyxudZUaDSorkUyz7EGUmQsEt0IWh_4u3qSaw7Daey4kXs_LzzJjzhyF3uW7TDJX7OcJpbVEHwXl_OduGp9N3Py35X43Lwcyre02r1uy-cq9UoVS9o4bfIWMFitTMC6VuB0UTMYalrTBEsenSfyoFhRW4B32FDwwYCjYL1aifu_b2Tm4zjFLz-dj86hsqZQP0yNRn8 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/MICRO56248.2022.00028 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781665462723 1665462728 |
| EndPage | 227 |
| ExternalDocumentID | 9923874 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:51:45 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3 |
| OpenAccessLink | http://hdl.handle.net/2117/376016 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_9923874 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-Oct. |
| PublicationDateYYYYMMDD | 2022-10-01 |
| PublicationDate_xml | – month: 10 year: 2022 text: 2022-Oct. |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO) |
| PublicationTitleAbbrev | MICRO |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib053578255 |
| Score | 2.2105489 |
| Snippet | Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 213 |
| SubjectTerms | Caches Focusing GPU Graphics Graphics processing units Instruction sets Low-power Microarchitecture Pipelines Scheduling Texture Locality Upper bound |
| Title | DTexL: Decoupled Raster Pipeline for Texture Locality |
| URI | https://ieeexplore.ieee.org/document/9923874 |
| WOSCitedRecordID | wos000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5t8eBJpRXf5ODR2N3N5uXVWjyUWkqF3koes1CQttSu-POdpLV68OIthECYSeD7JplvhpBbkUGWOy1ZVfHASmc1s9J7xqXUiCcWQ2ibmk2o4VBPp2bUIHd7LQwApOQzuI_D9Jcflr6OT2Vdg2xEq7JJmkrJrVbr--6IWLUF6fFOpJNnpotmjF8Q3lMGV5Hqcsae67-aqCQM6R_9b_dj0vkR49HRHmZOSAMWbSJ6E_gcPNAeRo_16g0CHdtY8oCO5quoMAeKZJTimvhBQAcRsJBud8hr_2ny-Mx2HRAYOqncsGCkyqus8Fpy5QvnOEYcpQPE-VCp4LWwhbFC2IwDF1WZWVME4a1XyJy8tvyUtBbLBZwRqpVDJEcyxAPGdME4pNVCeZdpyL0o5TlpR5Nnq22Ri9nO2ou_py_JYfTpNqvtirQ26xquyYH_2Mzf1zfpZL4AOT2N0g |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qFfSk0orf5uDRtdlNssl6tZaKay2lQm8lXwsF2Zbaij_fybZWD168hRAIMwm8N8m8GYBrQT2NjUqjomAu4karSKfWRixNFeKJxhBaV80mZK-nRqOsX4ObjRbGe18ln_nbMKz-8t3ULsNTWStDNqIk34JtwXlCV2qt79sjQt0WJMhrmU5MsxYaMnhBgK9yuJKqMmfouv6rjUqFIp39_-1_AM0fOR7pb4DmEGq-bIBoD_1nfkfaGD8uZ2_ekYEORQ9IfzILGnNPkI4SXBO-CEgeIAsJdxNeOw_D-2607oEQoZv4InJZKuOCJlalTNrEGIYxBzcekd4V0lkldJJpITRlnomCU50lTlhtJXInqzQ7gno5Lf0xECUNYjnSIeYwqnOZQWItpDVU-dgKnp5AI5g8nq3KXIzX1p7-PX0Fu93hcz7OH3tPZ7AX_LvKcTuH-mK-9BewYz8Wk_f5ZXVKX87mkRk |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+55th+IEEE%2FACM+International+Symposium+on+Microarchitecture+%28MICRO%29&rft.atitle=DTexL%3A+Decoupled+Raster+Pipeline+for+Texture+Locality&rft.au=Joseph%2C+Diya&rft.au=Aragon%2C+Juan+L.&rft.au=Parcerisa%2C+Joan-Manuel&rft.au=Gonzalez%2C+Antonio&rft.date=2022-10-01&rft.pub=IEEE&rft.spage=213&rft.epage=227&rft_id=info:doi/10.1109%2FMICRO56248.2022.00028&rft.externalDocID=9923874 |