DTexL: Decoupled Raster Pipeline for Texture Locality

Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Cac...

Full description

Saved in:
Bibliographic Details
Published in:2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO) pp. 213 - 227
Main Authors: Joseph, Diya, Aragon, Juan L., Parcerisa, Joan-Manuel, Gonzalez, Antonio
Format: Conference Proceeding
Language:English
Published: IEEE 01.10.2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead.
AbstractList Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead.
Author Parcerisa, Joan-Manuel
Gonzalez, Antonio
Joseph, Diya
Aragon, Juan L.
Author_xml – sequence: 1
  givenname: Diya
  surname: Joseph
  fullname: Joseph, Diya
  organization: Universitat Politècnica de Catalunya,Barcelona,Spain
– sequence: 2
  givenname: Juan L.
  surname: Aragon
  fullname: Aragon, Juan L.
  organization: Universidad de Murcia,Murcia,Spain
– sequence: 3
  givenname: Joan-Manuel
  surname: Parcerisa
  fullname: Parcerisa, Joan-Manuel
  organization: Universitat Politècnica de Catalunya,Barcelona,Spain
– sequence: 4
  givenname: Antonio
  surname: Gonzalez
  fullname: Gonzalez, Antonio
  organization: Universitat Politècnica de Catalunya,Barcelona,Spain
BookMark eNotzN1Kw0AQQOEVFNSaJxBhXyBxMpvZH-8ktVqIVEq9LpvNBBZiEpIU7Nsr6NW5-Ti34rIfehbiIYcsz8E9vm_L_Y40FjZDQMwAAO2FSJyxudZUaDSorkUyz7EGUmQsEt0IWh_4u3qSaw7Daey4kXs_LzzJjzhyF3uW7TDJX7OcJpbVEHwXl_OduGp9N3Py35X43Lwcyre02r1uy-cq9UoVS9o4bfIWMFitTMC6VuB0UTMYalrTBEsenSfyoFhRW4B32FDwwYCjYL1aifu_b2Tm4zjFLz-dj86hsqZQP0yNRn8
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MICRO56248.2022.00028
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665462723
1665462728
EndPage 227
ExternalDocumentID 9923874
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:51:45 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3
OpenAccessLink http://hdl.handle.net/2117/376016
PageCount 15
ParticipantIDs ieee_primary_9923874
PublicationCentury 2000
PublicationDate 2022-Oct.
PublicationDateYYYYMMDD 2022-10-01
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-Oct.
PublicationDecade 2020
PublicationTitle 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)
PublicationTitleAbbrev MICRO
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib053578255
Score 2.210651
Snippet Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load...
SourceID ieee
SourceType Publisher
StartPage 213
SubjectTerms Caches
Focusing
GPU
Graphics
Graphics processing units
Instruction sets
Low-power
Microarchitecture
Pipelines
Scheduling
Texture Locality
Upper bound
Title DTexL: Decoupled Raster Pipeline for Texture Locality
URI https://ieeexplore.ieee.org/document/9923874
WOSCitedRecordID wos000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2AePCkBozf6cGjlaUf29arSDwQJAQTbqTbThMSAwTB-POdFkQPXrw1TTfNbJt9r9t5bwi5jTIAgBesgtIy6TgwhyjGqgiQIB50kYXCfT0YmMnEDmvkbq-FwSdz8hncp2a-yw8Lv0m_ytoW2YjRsk7qWpdbrdb33lHJtQXp8U6k0ylsG8MYvSC85wwunn05U831X0VUMob0jv43-zFp_Yjx6HAPMyekBvMmUd0xfPYfaBdPj5vlGwQ6csnygA5ny6QwB4pklOKYdEFA-wmwkG63yGvvafz4zHYVEJgTQq5ZsKXuxIJ7UwrteVUJPHHIChDnQ9TBG-W4dUq5QoBQURbO8qC88xqZkzdOnJLGfDGHM0I7JlovvZJBChmtNMl9q8TPpOc6OgPnpJlCni63JhfTXbQXf3dfksP0TrdZbVeksV5t4Joc-I_17H11k1fmC0W_jwU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmuhJDRi_7cGjK0s_tq1XkWBckRBMuJFuO01IDBAE48-3XRA9ePHWNN00s232vW7nvQG49twhomVJgZlOuKGYmIBiSeERI8SjTEuhcC67XTUc6l4FbjZamPBkmXyGt7FZ3uW7qV3GX2UNHdiIknwLtgXnNF2ptb53j4i-LYEgr2U6zVQ3QiD9lwDwZQ4XLZ05Y9X1X2VUShRp7_9v_gOo_8jxSG8DNIdQwUkNRGuAn_kdaYXz43L2ho70TTQ9IL3xLGrMkQQ6SsKYeEVA8ghZgXDX4bX9MLjvJOsaCIlhjC8SpzPZ9Cm1KmPS0qJg4czBCwxI77x0VglDtRHCpAyZ8Dw1mjphjZWBO1ll2BFUJ9MJHgNpKq8tt4I7zrjXXEX_rSx8KC2V3ig8gVoMeTRb2VyM1tGe_t19BbudwXM-yh-7T2ewF9_vKsftHKqL-RIvYMd-LMbv88tylb4A68qSTA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+55th+IEEE%2FACM+International+Symposium+on+Microarchitecture+%28MICRO%29&rft.atitle=DTexL%3A+Decoupled+Raster+Pipeline+for+Texture+Locality&rft.au=Joseph%2C+Diya&rft.au=Aragon%2C+Juan+L.&rft.au=Parcerisa%2C+Joan-Manuel&rft.au=Gonzalez%2C+Antonio&rft.date=2022-10-01&rft.pub=IEEE&rft.spage=213&rft.epage=227&rft_id=info:doi/10.1109%2FMICRO56248.2022.00028&rft.externalDocID=9923874