Experiences in Managing High-performance Computing Management and Support Tools while Upgrading a Campus Cluster
The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance compu...
Uloženo v:
| Vydáno v: | SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 607 - 612 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
17.11.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance computing (HPC) infrastructure components and management strategies. We detail our approach to overcoming challenges posed by node heterogeneity, enhancing job scheduling efficiency, and improving resource allocation and billing fairness.The legacy TSCC 1.0 is described first, focusing on some critical issues we want to solve under TSCC 2.0. The HPC tools under TSCC 2.0 are then described. Lastly, the best practices and experiences learned are discussed. |
|---|---|
| DOI: | 10.1109/SCW63240.2024.00084 |