A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale

We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercompu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SC24: International Conference for High Performance Computing, Networking, Storage and Analysis S. 1 - 18
Hauptverfasser: Brewer, Wesley, Maiterth, Matthias, Kumar, Vineet, Wojda, Rafal, Bouknight, Sedrick, Hines, Jesse, Shin, Woong, Greenwood, Scott, Grant, David, Williams, Wesley, Wang, Feiyi
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 17.11.2024
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercomputer and central energy plant. The framework enables the study of "what-if" scenarios, system optimizations, and virtual prototyping of future systems. Using Frontier as a case study, we demonstrate the framework's capabilities by replaying six months of system telemetry for systematic verification and validation. Such a comprehensive analysis of a liquid-cooled exascale supercomputer is the first of its kind. ExaDigiT elucidates complex transient cooling system dynamics, runs synthetic or real workloads, and predicts energy losses due to rectification and voltage conversion. Throughout our paper, we present lessons learned to benefit HPC practitioners developing similar digital twins. We envision the digital twin will be a key enabler for sustainable, energy-efficient supercomputing.
AbstractList We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercomputer and central energy plant. The framework enables the study of "what-if" scenarios, system optimizations, and virtual prototyping of future systems. Using Frontier as a case study, we demonstrate the framework's capabilities by replaying six months of system telemetry for systematic verification and validation. Such a comprehensive analysis of a liquid-cooled exascale supercomputer is the first of its kind. ExaDigiT elucidates complex transient cooling system dynamics, runs synthetic or real workloads, and predicts energy losses due to rectification and voltage conversion. Throughout our paper, we present lessons learned to benefit HPC practitioners developing similar digital twins. We envision the digital twin will be a key enabler for sustainable, energy-efficient supercomputing.
Author Maiterth, Matthias
Wojda, Rafal
Hines, Jesse
Kumar, Vineet
Wang, Feiyi
Shin, Woong
Williams, Wesley
Brewer, Wesley
Greenwood, Scott
Bouknight, Sedrick
Grant, David
Author_xml – sequence: 1
  givenname: Wesley
  surname: Brewer
  fullname: Brewer, Wesley
  email: brewerwh@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 2
  givenname: Matthias
  surname: Maiterth
  fullname: Maiterth, Matthias
  email: maitherthm@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 3
  givenname: Vineet
  surname: Kumar
  fullname: Kumar, Vineet
  email: kumarv@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 4
  givenname: Rafal
  surname: Wojda
  fullname: Wojda, Rafal
  email: wojdarp@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 5
  givenname: Sedrick
  surname: Bouknight
  fullname: Bouknight, Sedrick
  email: bouknightsl@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 6
  givenname: Jesse
  surname: Hines
  fullname: Hines, Jesse
  email: hinesjr@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 7
  givenname: Woong
  surname: Shin
  fullname: Shin, Woong
  email: shinw@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 8
  givenname: Scott
  surname: Greenwood
  fullname: Greenwood, Scott
  email: greenwoodms@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 9
  givenname: David
  surname: Grant
  fullname: Grant, David
  email: grantdr@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 10
  givenname: Wesley
  surname: Williams
  fullname: Williams, Wesley
  email: williamswc@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 11
  givenname: Feiyi
  surname: Wang
  fullname: Wang, Feiyi
  email: fwang2@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
BookMark eNotjN1KwzAYQCMoqHMvIF7kBTrz06TN5diPCgMvtl2Pr-kXCbZNTVKmb-9Ab865OHDuyfUQBiTkkbMF58w871clL5leCCbKBWNMmCsyN5WppWJSCcOrWzJPyTdMVZWsJJN35Lika__hM3T0cPYD3Ubo8RziJ3Uh0p3_mnxb2BA6bOl-GjHa0I9TxpgoJLrGPgwpR8iXDJluviFZ6PCB3DjoEs7_PSPH7eawei127y9vq-WuAFHzXHCLslSagbDOuQtVKxpnlFYlai0scmEbkDVYow0Cdw6F09LWIB1aUHJGnv6-HhFPY_Q9xJ8TZ5WRgtfyF20sU2w
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SC41406.2024.00029
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350352917
EndPage 18
ExternalDocumentID 10793218
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a281t-1ce34560a2cfffa2c5d2bf95654e662ce12cba38ac969ea1ffe2f63c8a3feca53
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001414891300094&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jan 01 06:01:57 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a281t-1ce34560a2cfffa2c5d2bf95654e662ce12cba38ac969ea1ffe2f63c8a3feca53
OpenAccessLink https://www.osti.gov/servlets/purl/2479037
PageCount 18
ParticipantIDs ieee_primary_10793218
PublicationCentury 2000
PublicationDate 2024-Nov.-17
PublicationDateYYYYMMDD 2024-11-17
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-Nov.-17
  day: 17
PublicationDecade 2020
PublicationTitle SC24: International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib057737303
Score 1.9194981
Snippet We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1)...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms augmented reality
Cooling
data center power
Digital twins
electronics cooling
energy efficiency
exascale computing
Optimization
Supercomputers
System dynamics
Systematics
Telemetry
Transient analysis
Virtual prototyping
Voltage
Title A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale
URI https://ieeexplore.ieee.org/document/10793218
WOSCitedRecordID wos001414891300094&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcAEiCLe8sAaiO3EjkfUhxiqqlJbqVvl2GcUCZLSJIWfj502wMLAYkVeIt1D911y330I3cvYhK5vSAKWqCiIwPghACsCo7i2VsQphWZl_lhMJslyKad7snrDhQGAZvgMHvxj8y_fFLr2n8pchrtocjWpgzpC8B1Zqw2eWAjmopW1xJhQPs76kWsf_BwC9SuyQw8jf0moNBVkdPzPd5-g3g8XD0-_q8wpOoD8DC2e8CB78XofeP6R5XjUjlhhh0HxOHuvMxPoongFg2f1GjZ6L95QYlXiAbx5VOiXRBisKjz8VKVzFfTQYjSc95-DvUBCoGhCqoBoYA4AhYo6u1p3xoam1nU8cQScUw2E6lQ5L2jJJShiLVDLmU4Us6BVzM5RNy9yuEBYcqNFwlJCFI-kcokeeWV0JjV1iIKYS9TzNlmtdzswVq05rv64v0ZH3uyetUfEDepWmxpu0aHeVlm5uWs89wXzBZsP
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMEEiCLeeGANxHZeHlEfKiJUldpK3SrHPqNIkJQmAX4-dkiAhYHFirxEuofuu-S--xC65r5yTd8QOSwSnuOBskMAOnSUCKTWoZ9QqFfmx-F4HC0WfNKQ1WsuDADUw2dwYx_rf_kql5X9VGYy3ESTqUmbaMtKZzV0rTZ8_DBkJl5ZS41x-e2055kGwk4iULsk27VA8peISl1Dhnv_fPs-6v6w8fDku84coA3IDtH8DvfTJ6v4gWfvaYaH7ZAVNigUx-lrlSpH5vkzKDytVrCWjXxDgUWB-_BicaFdE6GwKPHgQxTGWdBF8-Fg1hs5jUSCI2hESodIYAYCuYIay2pz-oom2vQ8vgdBQCUQKhNh_CB5wEEQrYHqgMlIMA1S-OwIdbI8g2OEeaBkGLGEEBF4XJhU96w2OuOSGkxB1AnqWpssV19bMJatOU7_uL9CO6PZY7yM78cPZ2jXusBy-Eh4jjrluoILtC3fyrRYX9Ze_AQ25Z5Y
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC24%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=A+Digital+Twin+Framework+for+Liquid-cooled+Supercomputers+as+Demonstrated+at+Exascale&rft.au=Brewer%2C+Wesley&rft.au=Maiterth%2C+Matthias&rft.au=Kumar%2C+Vineet&rft.au=Wojda%2C+Rafal&rft.date=2024-11-17&rft.pub=IEEE&rft.spage=1&rft.epage=18&rft_id=info:doi/10.1109%2FSC41406.2024.00029&rft.externalDocID=10793218