Experiences in Managing High-performance Computing Management and Support Tools while Upgrading a Campus Cluster
The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance compu...
Uloženo v:
| Vydáno v: | SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 607 - 612 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
17.11.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance computing (HPC) infrastructure components and management strategies. We detail our approach to overcoming challenges posed by node heterogeneity, enhancing job scheduling efficiency, and improving resource allocation and billing fairness.The legacy TSCC 1.0 is described first, focusing on some critical issues we want to solve under TSCC 2.0. The HPC tools under TSCC 2.0 are then described. Lastly, the best practices and experiences learned are discussed. |
|---|---|
| AbstractList | The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance computing (HPC) infrastructure components and management strategies. We detail our approach to overcoming challenges posed by node heterogeneity, enhancing job scheduling efficiency, and improving resource allocation and billing fairness.The legacy TSCC 1.0 is described first, focusing on some critical issues we want to solve under TSCC 2.0. The HPC tools under TSCC 2.0 are then described. Lastly, the best practices and experiences learned are discussed. |
| Author | Chen, Yuwu Sivagnanam, Subhashini Irving, Christopher Wolter, Nicole Mishin, Dmitry Tatineni, Mahidhar Cooper, Trevor |
| Author_xml | – sequence: 1 givenname: Yuwu surname: Chen fullname: Chen, Yuwu email: ychen64@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 2 givenname: Trevor surname: Cooper fullname: Cooper, Trevor email: tcooper@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 3 givenname: Christopher surname: Irving fullname: Irving, Christopher email: cirving@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 4 givenname: Mahidhar surname: Tatineni fullname: Tatineni, Mahidhar email: mahidhar@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 5 givenname: Nicole surname: Wolter fullname: Wolter, Nicole email: nickel@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 6 givenname: Dmitry surname: Mishin fullname: Mishin, Dmitry email: dmishin@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 7 givenname: Subhashini surname: Sivagnanam fullname: Sivagnanam, Subhashini email: sivagnan@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA |
| BookMark | eNotkE1OwzAUhI0EElB6Alj4AinPf6m9RFGhSEUs2opl5SQvqaXEsexUwO1Jgc3M4puZxdySSz94JOSewYIxMI_b4iMXXMKCA5cLANDygszN0mihQCilpLgm85RcCTkoLUGrGxJWXwGjQ19hos7TN-tt63xL1649ZhNqhtjbidJi6MNpPKPfDPboR2p9TbenEIY40t0wdIl-Hl2HdB_aaOtz2NLCTsVEi-6URox35KqxXcL5v8_I_nm1K9bZ5v3ltXjaZJarfMw4t7qSNSyZAImNsFLWthSlQgnGaI4GGUyS5wimUbJsmEHNgDNdVYKBmJGHv12HiIcQXW_j94GB5pBPT_wAWcxcFw |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SCW63240.2024.00084 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350355543 |
| EndPage | 612 |
| ExternalDocumentID | 10820654 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIL |
| ID | FETCH-LOGICAL-a256t-22a8c4d071304ef3a44dab3b5e409982e9e10e9e66e09f54bf19e810218cc3103 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001451792300064&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:59:34 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a256t-22a8c4d071304ef3a44dab3b5e409982e9e10e9e66e09f54bf19e810218cc3103 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_10820654 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Nov.-17 |
| PublicationDateYYYYMMDD | 2024-11-17 |
| PublicationDate_xml | – month: 11 year: 2024 text: 2024-Nov.-17 day: 17 |
| PublicationDecade | 2020 |
| PublicationTitle | SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC-W |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib060584085 |
| Score | 1.8898883 |
| Snippet | The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 607 |
| SubjectTerms | Best practices campus cluster Conferences Focusing High performance computing HPC Processor scheduling Resource management Supercomputers upgrade User support |
| Title | Experiences in Managing High-performance Computing Management and Support Tools while Upgrading a Campus Cluster |
| URI | https://ieeexplore.ieee.org/document/10820654 |
| WOSCitedRecordID | wos001451792300064&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PS8MwFMeDGx48qTjxNzl4jfZH2iTn4vA0Bm6420jTFx2Mtqyr_vvmpZ09efBSSgMtJGmT1_e-ny8hjwYSUNYWDAS3jIcyZUoGhvW8LTdFcuPNJsRsJlcrNe_F6l4LAwC--Aye8NTn8ovKtPirzL3hSBtP-IiMhBCdWOsweTC9h7SuniwUBur5LXtHGHngosAIGdkdwnTwUPFLyPT0nw8_I5NBjEfnv8vMOTmC8oLUA6K4oZuSHuyGKNZtsHpQA9DOtgGbhkoXqsuCop-n23vTRVVtG_r96T4PdFl_7HxRPdUUsxJtQ7NtiyyFCVlOXxbZK-vNE5h2u5g9iyItDS8wCA042FhzXug8zhNwEZ2SESgIA3dIUwiUTXhuQwUSjb6lMWg-dknGZVXCFaGR4TKVOnfRR85RuZtKm3ILIGJ32yK-JhPsrnXd8THWh566-eP6LTnBEUFFXyjuyHi_a-GeHJuv_abZPfhR_QEjo6Qo |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PT4MwFMcbnSZ6UuOMv-3Ba5UfBdrz4jLjXJa4xd2WUl51yQIEhv779gGTkwcvhNAEkrbQPt77fr6E3GsIQBqTMIi4YdwVIZPC0azlbdkpEuvabCKaTMRiIaetWL3WwgBAXXwGD3ha5_KTTFf4q8y-4UgbD_gu2Qs499xGrrWdPpjgQ15XyxZyHfn4NnhHHLlj40APKdkNxLRzUakXkeHRPx9_TPqdHI9OfxeaE7ID6SnJO0hxSVcp3RoOUazcYHmnB6CNcQM2dbUuVKUJRUdPu_umsyxbl_T7034g6Dz_KOqyeqoo5iWqkg7WFdIU-mQ-fJoNRqy1T2DK7mM2zPOU0DzBMNThYHzFeaJiPw7AxnRSeCDBdewhDMGRJuCxcSUItPoWWqP92BnppVkK54R6motQqNjGHzFH7W4oTMgNQOTb2yb-Beljdy3zhpCx3PbU5R_X78jBaPY6Xo6fJy9X5BBHB_V9bnRNepuighuyr782q7K4rUf4B9rjp28 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC24-W%3A+Workshops+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Experiences+in+Managing+High-performance+Computing+Management+and+Support+Tools+while+Upgrading+a+Campus+Cluster&rft.au=Chen%2C+Yuwu&rft.au=Cooper%2C+Trevor&rft.au=Irving%2C+Christopher&rft.au=Tatineni%2C+Mahidhar&rft.date=2024-11-17&rft.pub=IEEE&rft.spage=607&rft.epage=612&rft_id=info:doi/10.1109%2FSCW63240.2024.00084&rft.externalDocID=10820654 |