ACRS: Adjacent Computation Resource Sharing among Partitioned GPU Sub-Cores

Modern GPUs typically segment Streaming Multiprocessors (SMs) into sub-cores (e.g. 4 sub-cores) to reduce power consumption and chip area. However, this partitioned design prevents potential task distributions across sub-cores, impairing overall execution efficiency. In this paper, we explore the pe...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři: Song, Penghao, Wang, Chongxi, Han, Chenji, Zhao, Haoyu, Zhang, Tingting, Liu, Tianyi, Wang, Jian
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 22.06.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Modern GPUs typically segment Streaming Multiprocessors (SMs) into sub-cores (e.g. 4 sub-cores) to reduce power consumption and chip area. However, this partitioned design prevents potential task distributions across sub-cores, impairing overall execution efficiency. In this paper, we explore the performance benefit of sharing hardware resources among sub-cores and identify functional units (FUs) as critical components for compute-intensive applications. Moreover, our observations reveal that instructions residing in operand collectors can be obstructed by back-end FUs, but there is a high probability that unoccupied FUs are available in adjacent sub-cores during such blockages. In response, we introduce the adjacent computation resource sharing (ACRS) framework to efficiently utilize these unoccupied units among sub-cores. ACRS has two key modules: Shared FU Issue (SF_ISSUE) and Shared FU Write Back (SF_WriteBack). SF_ISSUE monitors the status of operand collectors and functional units, and offloads instructions from blocked sub-cores to unoccupied resources. Meanwhile, SF_WriteBack routes results back to the original sub-core.To minimize wiring overhead, each sub-core is assigned a fixed target core for sharing. We design a series of matching policies and finally filter out the most effective sequential method. Evaluation results show that ACRS improves performance by up to 46.4 \%, with an average of 14.1 \% over the traditional partitioned architecture, while reducing energy consumption by 8.3 \%. Besides, ACRS achieves an additional 12.3% performance improvement compared with the SOTA method.
AbstractList Modern GPUs typically segment Streaming Multiprocessors (SMs) into sub-cores (e.g. 4 sub-cores) to reduce power consumption and chip area. However, this partitioned design prevents potential task distributions across sub-cores, impairing overall execution efficiency. In this paper, we explore the performance benefit of sharing hardware resources among sub-cores and identify functional units (FUs) as critical components for compute-intensive applications. Moreover, our observations reveal that instructions residing in operand collectors can be obstructed by back-end FUs, but there is a high probability that unoccupied FUs are available in adjacent sub-cores during such blockages. In response, we introduce the adjacent computation resource sharing (ACRS) framework to efficiently utilize these unoccupied units among sub-cores. ACRS has two key modules: Shared FU Issue (SF_ISSUE) and Shared FU Write Back (SF_WriteBack). SF_ISSUE monitors the status of operand collectors and functional units, and offloads instructions from blocked sub-cores to unoccupied resources. Meanwhile, SF_WriteBack routes results back to the original sub-core.To minimize wiring overhead, each sub-core is assigned a fixed target core for sharing. We design a series of matching policies and finally filter out the most effective sequential method. Evaluation results show that ACRS improves performance by up to 46.4 \%, with an average of 14.1 \% over the traditional partitioned architecture, while reducing energy consumption by 8.3 \%. Besides, ACRS achieves an additional 12.3% performance improvement compared with the SOTA method.
Author Han, Chenji
Wang, Jian
Wang, Chongxi
Song, Penghao
Zhao, Haoyu
Zhang, Tingting
Liu, Tianyi
Author_xml – sequence: 1
  givenname: Penghao
  surname: Song
  fullname: Song, Penghao
  email: songpenghao16@mails.ucas.ac.cn
  organization: Institute of Computing Technology, CAS,State Key Lab of Processors,Beijing,China
– sequence: 2
  givenname: Chongxi
  surname: Wang
  fullname: Wang, Chongxi
  email: wangzhongxi15@mails.ucas.ac.cn
  organization: Institute of Computing Technology, CAS,State Key Lab of Processors,Beijing,China
– sequence: 3
  givenname: Chenji
  surname: Han
  fullname: Han, Chenji
  email: hanchenji16@mails.ucas.ac.cn
  organization: Institute of Computing Technology, CAS,State Key Lab of Processors,Beijing,China
– sequence: 4
  givenname: Haoyu
  surname: Zhao
  fullname: Zhao, Haoyu
  email: zhaohaoyu@loongson.cn
  organization: Loongson Technology Co. Ltd,Beijing,China
– sequence: 5
  givenname: Tingting
  surname: Zhang
  fullname: Zhang, Tingting
  email: zhangtingting@loongson.cn
  organization: Loongson Technology Co. Ltd,Beijing,China
– sequence: 6
  givenname: Tianyi
  surname: Liu
  fullname: Liu, Tianyi
  email: tianyi.liu@utsa.edu
  organization: University of Texas at San Antonio,United States
– sequence: 7
  givenname: Jian
  surname: Wang
  fullname: Wang, Jian
  email: jw@ict.ac.cn
  organization: Institute of Computing Technology, CAS,State Key Lab of Processors,Beijing,China
BookMark eNo1j11LwzAYhSO4C537ByL5A51J36RpvCtRpzjYWN31eJsPrdhmpO2F_96JenMOPBweOJfkvI-9J-SGsyXnTN_eV6aAUuhlznJ5QhxyKdkZWWilSwAuGTBRXpCXyuzqO1q5D7S-H6mJ3XEacWxjT3d-iFOyntbvmNr-jWIXT7nFNLY_A-_oarun9dRkJiY_XJFZwM_BL_56TvaPD6_mKVtvVs-mWmfIlR4z6zA0XHEPYEEoWwrOpQtKByzQNk4GoXLNHKAWBceyAYlMihA0MsVcA3Ny_ettvfeHY2o7TF-H_4_wDTCySw0
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11132550
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11132550
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-cdafb171e33c347c84115df79fa6acbd5f47290d3a9461a8b35a054ff9a070db3
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-cdafb171e33c347c84115df79fa6acbd5f47290d3a9461a8b35a054ff9a070db3
PageCount 7
ParticipantIDs ieee_primary_11132550
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.2953556
Snippet Modern GPUs typically segment Streaming Multiprocessors (SMs) into sub-cores (e.g. 4 sub-cores) to reduce power consumption and chip area. However, this...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Computational efficiency
Computer architecture
Graphics processing units
Hardware
Matched filters
Monitoring
Performance gain
Power demand
Resource management
Wiring
Title ACRS: Adjacent Computation Resource Sharing among Partitioned GPU Sub-Cores
URI https://ieeexplore.ieee.org/document/11132550
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LTgMhFCXauHClxjG-w8It7QzPwV0zWk1MmonapLsGuJDoopq-vl-grcaFC3eEQCCX1wHuuQehm1LaQDVoUlltCOeWEwu-JCpeh6hRVDjpstiEGg7r8Vi3G7J65sJ477Pzme-mZP7Lhw-3TE9lvSyLLtINfVcpuSZrbVi_Val7d_0mziae6CdUdLeFf8mm5FNjcPDP9g5R8cO_w-33yXKEdvz0GD31m-eXW9yHd5M8KvFajyEbFm8f4XGKvxyr4KwhhNs0L3IsIsAP7QjHXYI0sRPzAo0G96_NI9lIIRATV8yCODDBVqryjDnGlat5RHIQlA5GGmdBBB5RcgnMaC4rU1smTARjIWgT1zRYdoI609jaKcIMqNI2gLSqiugJLBcgIeIUbmUQyp6hIlli8rmOdjHZGuH8j_wLtJ_sndynKL1EncVs6a_Qnlst3uaz6zxGX_O4k40
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECammuhJjTW-5eCVdpflUbw1q7Wmtdlom_TWAAOJHqrpw98v0FbjwYO3DVkCGV7fwHzzIXSTCeOpAkVyozRhzDBiwGVEBneIakm5FTaJTcjBoDUeq2pNVk9cGOdcCj5zjfiZ3vLh3S7jVVkzyaLz6KFvcxYcnxVda837zTPVvGuXYT6xSEChvLH5_ZdwSjo3Ovv_bPEA1X8YeLj6PlsO0ZabHqFeu3x-ucVteNMxphKvFBmSafHmGh7HDMyhCk4qQriKMyNlIwL8UI1w2CdIGToxr6NR535YdslaDIHosGYWxIL2Jpe5KwpbMGlbLGA58FJ5LbQ1wD0LODmDQismct0yBdcBjnmvdFjVYIpjVJuG1k4QLoBKZTwII_OAn8AwDgICUmFGeC7NKapHS0w-VvkuJhsjnP1Rfo12u8On_qT_OOido71o-xhMRekFqi1mS3eJduzn4nU-u0rj9QXRlpbU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=ACRS%3A+Adjacent+Computation+Resource+Sharing+among+Partitioned+GPU+Sub-Cores&rft.au=Song%2C+Penghao&rft.au=Wang%2C+Chongxi&rft.au=Han%2C+Chenji&rft.au=Zhao%2C+Haoyu&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11132550&rft.externalDocID=11132550