DTexL: Decoupled Raster Pipeline for Texture Locality

Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Cac...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO) s. 213 - 227
Hlavní autoři: Joseph, Diya, Aragon, Juan L., Parcerisa, Joan-Manuel, Gonzalez, Antonio
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.10.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead.
AbstractList Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead.
Author Parcerisa, Joan-Manuel
Gonzalez, Antonio
Joseph, Diya
Aragon, Juan L.
Author_xml – sequence: 1
  givenname: Diya
  surname: Joseph
  fullname: Joseph, Diya
  organization: Universitat Politècnica de Catalunya,Barcelona,Spain
– sequence: 2
  givenname: Juan L.
  surname: Aragon
  fullname: Aragon, Juan L.
  organization: Universidad de Murcia,Murcia,Spain
– sequence: 3
  givenname: Joan-Manuel
  surname: Parcerisa
  fullname: Parcerisa, Joan-Manuel
  organization: Universitat Politècnica de Catalunya,Barcelona,Spain
– sequence: 4
  givenname: Antonio
  surname: Gonzalez
  fullname: Gonzalez, Antonio
  organization: Universitat Politècnica de Catalunya,Barcelona,Spain
BookMark eNotzN1Kw0AQQOEVFNSaJxBhXyBxMpvZH-8ktVqIVEq9LpvNBBZiEpIU7Nsr6NW5-Ti34rIfehbiIYcsz8E9vm_L_Y40FjZDQMwAAO2FSJyxudZUaDSorkUyz7EGUmQsEt0IWh_4u3qSaw7Daey4kXs_LzzJjzhyF3uW7TDJX7OcJpbVEHwXl_OduGp9N3Py35X43Lwcyre02r1uy-cq9UoVS9o4bfIWMFitTMC6VuB0UTMYalrTBEsenSfyoFhRW4B32FDwwYCjYL1aifu_b2Tm4zjFLz-dj86hsqZQP0yNRn8
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MICRO56248.2022.00028
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665462723
1665462728
EndPage 227
ExternalDocumentID 9923874
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:51:45 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3
OpenAccessLink http://hdl.handle.net/2117/376016
PageCount 15
ParticipantIDs ieee_primary_9923874
PublicationCentury 2000
PublicationDate 2022-Oct.
PublicationDateYYYYMMDD 2022-10-01
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-Oct.
PublicationDecade 2020
PublicationTitle 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)
PublicationTitleAbbrev MICRO
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib053578255
Score 2.210651
Snippet Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load...
SourceID ieee
SourceType Publisher
StartPage 213
SubjectTerms Caches
Focusing
GPU
Graphics
Graphics processing units
Instruction sets
Low-power
Microarchitecture
Pipelines
Scheduling
Texture Locality
Upper bound
Title DTexL: Decoupled Raster Pipeline for Texture Locality
URI https://ieeexplore.ieee.org/document/9923874
WOSCitedRecordID wos000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0A8eBJDRi_04NHV7rbdtt6FYkHgoSg4Ub6tQmJAcKH8ec7LYgevHhrNt1sprub96ad9wbgVjhKQ6nLDKE1mmoXZWYkZilcG26i_4dNZs9vPdnvq_FYD2pwt9fChBBS8Vm4j8N0lu_nbhO3ytoa2YiSvA51KcutVuv72xHRtQXp8U6kk1PdxjCGLwjvqYKrSL6csef6ryYqCUO6R_97-jG0fsR4ZLCHmROohVkTRGcUPnsPpIPZ42bxHjwZmmh5QAbTRVSYB4JklOCceEBAehGwkG634LX7NHp8znYdEDLDGF9nXpcyr2jhVMmkK6xlmHFwGxDnfSW9U8IU2ghhKAtMVJwaXXjhjJPInJwy7BQas_ksnAHxXjtZMfylLeXcCVU5RQ3PLd7kTG7PoRlDniy2JheTXbQXf1--hMO4ptuqtitorJebcA0H7mM9XS1v0pv5Amv5jbE
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNGTGjD-tgePTrq1XVevIsE4kRA03EjXdgkJAYJg_PN9LYgevHhrli7L67Z832vf9z2Aa2EodalKI4RWb6qdpJGWmKVwpbn2_h9FMHt-y2Wnkw0GqluBm40WxjkXis_crR-Gs3w7NUu_VdZQyEYyybdgW3Ce0JVa6_vrEd63BQnyWqYTU9XAQHovCPChhisJzpy-6_qvNioBRVr7_3v-AdR_5HikuwGaQ6i4SQ1Es-8-8zvSxPxxORs7S3ramx6Q7mjmNeaOIB0lOMcfEZDcQxYS7jq8th769-1o3QMh0ozxRWRVKuOSJiZLmTRJUTDMOXjhEOltKa3JhE6UFkJT5pgoOdUqscJoI5E7mUyzI6hOphN3DMRaZWTJ8KcuKOdGZKXJqOZxgTcZHRcnUPMhD2crm4vhOtrTvy9fwW67_5wP88fO0xns-fVd1bidQ3UxX7oL2DEfi9H7_DK8pS_93pD4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+55th+IEEE%2FACM+International+Symposium+on+Microarchitecture+%28MICRO%29&rft.atitle=DTexL%3A+Decoupled+Raster+Pipeline+for+Texture+Locality&rft.au=Joseph%2C+Diya&rft.au=Aragon%2C+Juan+L.&rft.au=Parcerisa%2C+Joan-Manuel&rft.au=Gonzalez%2C+Antonio&rft.date=2022-10-01&rft.pub=IEEE&rft.spage=213&rft.epage=227&rft_id=info:doi/10.1109%2FMICRO56248.2022.00028&rft.externalDocID=9923874