DTexL: Decoupled Raster Pipeline for Texture Locality

Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Cac...

Full description

Saved in:

Bibliographic Details
Published in:	2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO) pp. 213 - 227
Main Authors:	Joseph, Diya, Aragon, Juan L., Parcerisa, Joan-Manuel, Gonzalez, Antonio
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.10.2022
Subjects:	Caches Focusing GPU Graphics Graphics processing units Instruction sets Low-power Microarchitecture Pipelines Scheduling Texture Locality Upper bound
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead.
AbstractList	Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU's performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead.
Author	Parcerisa, Joan-Manuel Gonzalez, Antonio Joseph, Diya Aragon, Juan L.
Author_xml	– sequence: 1 givenname: Diya surname: Joseph fullname: Joseph, Diya organization: Universitat Politècnica de Catalunya,Barcelona,Spain – sequence: 2 givenname: Juan L. surname: Aragon fullname: Aragon, Juan L. organization: Universidad de Murcia,Murcia,Spain – sequence: 3 givenname: Joan-Manuel surname: Parcerisa fullname: Parcerisa, Joan-Manuel organization: Universitat Politècnica de Catalunya,Barcelona,Spain – sequence: 4 givenname: Antonio surname: Gonzalez fullname: Gonzalez, Antonio organization: Universitat Politècnica de Catalunya,Barcelona,Spain
BookMark	eNotzN1Kw0AQQOEVFNSaJxBhXyBxMpvZH-8ktVqIVEq9LpvNBBZiEpIU7Nsr6NW5-Ti34rIfehbiIYcsz8E9vm_L_Y40FjZDQMwAAO2FSJyxudZUaDSorkUyz7EGUmQsEt0IWh_4u3qSaw7Daey4kXs_LzzJjzhyF3uW7TDJX7OcJpbVEHwXl_OduGp9N3Py35X43Lwcyre02r1uy-cq9UoVS9o4bfIWMFitTMC6VuB0UTMYalrTBEsenSfyoFhRW4B32FDwwYCjYL1aifu_b2Tm4zjFLz-dj86hsqZQP0yNRn8
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/MICRO56248.2022.00028
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781665462723 1665462728
EndPage	227
ExternalDocumentID	9923874
Genre	orig-research
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL
ID	FETCH-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3
IEDL.DBID	RIE
ISICitedReferencesCount	2
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:51:45 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a334t-d9671f02c8637c2bb30964be075df7dc85a29a55a03e35f40a92d5cac7095c8a3
OpenAccessLink	http://hdl.handle.net/2117/376016
PageCount	15
ParticipantIDs	ieee_primary_9923874
PublicationCentury	2000
PublicationDate	2022-Oct.
PublicationDateYYYYMMDD	2022-10-01
PublicationDate_xml	– month: 10 year: 2022 text: 2022-Oct.
PublicationDecade	2020
PublicationTitle	2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)
PublicationTitleAbbrev	MICRO
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib053578255
Score	2.210651
Snippet	Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load...
SourceID	ieee
SourceType	Publisher
StartPage	213
SubjectTerms	Caches Focusing GPU Graphics Graphics processing units Instruction sets Low-power Microarchitecture Pipelines Scheduling Texture Locality Upper bound
Title	DTexL: Decoupled Raster Pipeline for Texture Locality
URI	https://ieeexplore.ieee.org/document/9923874
WOSCitedRecordID	wos000886530600013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2AePCkBozf6cGjlaUf29arSDwQJAQTbqTbThMSAwTB-POdFkQPXrw1TTfNbJt9r9t5bwi5jTIAgBesgtIy6TgwhyjGqgiQIB50kYXCfT0YmMnEDmvkbq-FwSdz8hncp2a-yw8Lv0m_ytoW2YjRsk7qWpdbrdb33lHJtQXp8U6k0ylsG8MYvSC85wwunn05U831X0VUMob0jv43-zFp_Yjx6HAPMyekBvMmUd0xfPYfaBdPj5vlGwQ6csnygA5ny6QwB4pklOKYdEFA-wmwkG63yGvvafz4zHYVEJgTQq5ZsKXuxIJ7UwrteVUJPHHIChDnQ9TBG-W4dUq5QoBQURbO8qC88xqZkzdOnJLGfDGHM0I7JlovvZJBChmtNMl9q8TPpOc6OgPnpJlCni63JhfTXbQXf3dfksP0TrdZbVeksV5t4Joc-I_17H11k1fmC0W_jwU
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmuhJDRi_7cGjK0s_tq1XkWBckRBMuJFuO01IDBAE48-3XRA9ePHWNN00s232vW7nvQG49twhomVJgZlOuKGYmIBiSeERI8SjTEuhcC67XTUc6l4FbjZamPBkmXyGt7FZ3uW7qV3GX2UNHdiIknwLtgXnNF2ptb53j4i-LYEgr2U6zVQ3QiD9lwDwZQ4XLZ05Y9X1X2VUShRp7_9v_gOo_8jxSG8DNIdQwUkNRGuAn_kdaYXz43L2ho70TTQ9IL3xLGrMkQQ6SsKYeEVA8ghZgXDX4bX9MLjvJOsaCIlhjC8SpzPZ9Cm1KmPS0qJg4czBCwxI77x0VglDtRHCpAyZ8Dw1mjphjZWBO1ll2BFUJ9MJHgNpKq8tt4I7zrjXXEX_rSx8KC2V3ig8gVoMeTRb2VyM1tGe_t19BbudwXM-yh-7T2ewF9_vKsftHKqL-RIvYMd-LMbv88tylb4A68qSTA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+55th+IEEE%2FACM+International+Symposium+on+Microarchitecture+%28MICRO%29&rft.atitle=DTexL%3A+Decoupled+Raster+Pipeline+for+Texture+Locality&rft.au=Joseph%2C+Diya&rft.au=Aragon%2C+Juan+L.&rft.au=Parcerisa%2C+Joan-Manuel&rft.au=Gonzalez%2C+Antonio&rft.date=2022-10-01&rft.pub=IEEE&rft.spage=213&rft.epage=227&rft_id=info:doi/10.1109%2FMICRO56248.2022.00028&rft.externalDocID=9923874