Thermal Model Identification of Computing Nodes in High-Performance Computing Systems

Thermal-aware design and online optimization of the cooling effort are becoming increasingly important in current and future high-performance computing (HPC) systems. A fundamental requirement to effectively develop such techniques is the availability of distributed and compact models representing t...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on industrial electronics (1982) Vol. 67; no. 9; pp. 7778 - 7788
Main Authors: Diversi, Roberto, Bartolini, Andrea, Benini, Luca
Format: Journal Article
Language:English
Published: New York IEEE 01.09.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0278-0046, 1557-9948
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Thermal-aware design and online optimization of the cooling effort are becoming increasingly important in current and future high-performance computing (HPC) systems. A fundamental requirement to effectively develop such techniques is the availability of distributed and compact models representing the system thermal behavior. System identification algorithms allow to extract models directly from the thermal response of the target device. This article proposes a novel thermal identification approach for real, in-production HPC systems, which is capable of extracting thermal models from a computing node affected by quantization noise on the temperature measurements as well as operating in the free-cooling mode, with variable ambient temperature. The approach allows also to identify the physical floorplan of the CPU dies in supercomputing nodes. The effectiveness of the proposed methodology has been tested on a node of the CINECA Galileo Tier-1 supercomputer system.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0046
1557-9948
DOI:10.1109/TIE.2019.2945277