Experiences in Managing High-performance Computing Management and Support Tools while Upgrading a Campus Cluster
The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance compu...
Uložené v:
| Vydané v: | SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 607 - 612 |
|---|---|
| Hlavní autori: | , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
17.11.2024
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance computing (HPC) infrastructure components and management strategies. We detail our approach to overcoming challenges posed by node heterogeneity, enhancing job scheduling efficiency, and improving resource allocation and billing fairness.The legacy TSCC 1.0 is described first, focusing on some critical issues we want to solve under TSCC 2.0. The HPC tools under TSCC 2.0 are then described. Lastly, the best practices and experiences learned are discussed. |
|---|---|
| AbstractList | The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing system. This paper describes the transition from TSCC 1.0 to TSCC 2.0, focusing on the implementation of new high-performance computing (HPC) infrastructure components and management strategies. We detail our approach to overcoming challenges posed by node heterogeneity, enhancing job scheduling efficiency, and improving resource allocation and billing fairness.The legacy TSCC 1.0 is described first, focusing on some critical issues we want to solve under TSCC 2.0. The HPC tools under TSCC 2.0 are then described. Lastly, the best practices and experiences learned are discussed. |
| Author | Chen, Yuwu Sivagnanam, Subhashini Irving, Christopher Wolter, Nicole Mishin, Dmitry Tatineni, Mahidhar Cooper, Trevor |
| Author_xml | – sequence: 1 givenname: Yuwu surname: Chen fullname: Chen, Yuwu email: ychen64@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 2 givenname: Trevor surname: Cooper fullname: Cooper, Trevor email: tcooper@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 3 givenname: Christopher surname: Irving fullname: Irving, Christopher email: cirving@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 4 givenname: Mahidhar surname: Tatineni fullname: Tatineni, Mahidhar email: mahidhar@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 5 givenname: Nicole surname: Wolter fullname: Wolter, Nicole email: nickel@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 6 givenname: Dmitry surname: Mishin fullname: Mishin, Dmitry email: dmishin@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA – sequence: 7 givenname: Subhashini surname: Sivagnanam fullname: Sivagnanam, Subhashini email: sivagnan@sdsc.edu organization: University of California San Diego,San Diego Supercomputer Center,San Diego,USA |
| BookMark | eNotkE1OwzAUhI0EElB6Alj4AinPf6m9RFGhSEUs2opl5SQvqaXEsexUwO1Jgc3M4puZxdySSz94JOSewYIxMI_b4iMXXMKCA5cLANDygszN0mihQCilpLgm85RcCTkoLUGrGxJWXwGjQ19hos7TN-tt63xL1649ZhNqhtjbidJi6MNpPKPfDPboR2p9TbenEIY40t0wdIl-Hl2HdB_aaOtz2NLCTsVEi-6URox35KqxXcL5v8_I_nm1K9bZ5v3ltXjaZJarfMw4t7qSNSyZAImNsFLWthSlQgnGaI4GGUyS5wimUbJsmEHNgDNdVYKBmJGHv12HiIcQXW_j94GB5pBPT_wAWcxcFw |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SCW63240.2024.00084 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350355543 |
| EndPage | 612 |
| ExternalDocumentID | 10820654 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIL |
| ID | FETCH-LOGICAL-a256t-22a8c4d071304ef3a44dab3b5e409982e9e10e9e66e09f54bf19e810218cc3103 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001451792300064&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:59:34 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a256t-22a8c4d071304ef3a44dab3b5e409982e9e10e9e66e09f54bf19e810218cc3103 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_10820654 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Nov.-17 |
| PublicationDateYYYYMMDD | 2024-11-17 |
| PublicationDate_xml | – month: 11 year: 2024 text: 2024-Nov.-17 day: 17 |
| PublicationDecade | 2020 |
| PublicationTitle | SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC-W |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib060584085 |
| Score | 1.8899889 |
| Snippet | The Triton Shared Computing Cluster (TSCC) [1] is the San Diego Supercomputer Center ("Center" in the remaining text)'s primary campus research computing... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 607 |
| SubjectTerms | Best practices campus cluster Conferences Focusing High performance computing HPC Processor scheduling Resource management Supercomputers upgrade User support |
| Title | Experiences in Managing High-performance Computing Management and Support Tools while Upgrading a Campus Cluster |
| URI | https://ieeexplore.ieee.org/document/10820654 |
| WOSCitedRecordID | wos001451792300064&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcAEiCK-5YHV4KROHc8VFQOqKtFCt8qOz1CpSqImgb-Pz2nJxMASWclgyZfE73z33iPkPpGAqnCOudQkeFqlmPLIgMVKZljXEmngrb29yOk0XS7VbEdWD1wYAAjNZ_CAw1DLt0XW4FGZ_8JRbTwRPdKTUrZkrf3Lg-U9VOvaKQtFXD2-jt9RjJz7LDBGjexWwrTzUAlbyOT4n5OfkEFHxqOz323mlBxAfkbKTqK4ouuc7u2GKPZtsLJjA9DWtgEfdZ0uVOeWop-nx950XhSbin5_-t8DXZQf29BUTzXFqkRT0fGmQS2FAVlMnubjZ7YzT2Dao5iaxbFOM2ExCeUC3FALYbUZmgR8RufDAwoi7i-jEXDlEmFcpCBFo-80y9B87Jz08yKHC0LBGsFd5LQwVkAsPaY0yhoPBLkbDSNxSQa4XKuy1cdY7Vfq6o_71-QII4KMvkjekH69beCWHGZf9bra3oWo_gD6R6Oc |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cBqcFKnieeKCkSpKtFCt8qOz1CpSqKmgb-Pz2nJxMASWclgyZfE73z33iPkNooBVeEss4mO8LRKMumQAQtlnGJdSySet_Y2iIfDZDqVozVZ3XNhAMA3n8EdDn0t3-RphUdl7gtHtfFIbJOdSIgwqOlam9cHC3yo17XWFgq4vH_tvaMcOXd5YIgq2bWIaeOi4jeR_sE_pz8k7YaOR0e_G80R2YLsmBSNSHFJ5xndGA5R7NxgRcMHoLVxAz5qel2oygxFR0-Hvuk4zxcl_f50Pwg6KT6Wvq2eKop1iaqkvUWFagptMuk_jHuPbG2fwJTDMSsWhipJhcE0lAuwHSWEUbqjI3A5nQsQSAi4u3S7wKWNhLaBhAStvpM0RfuxE9LK8gxOCQWjBbeBVUIbAWHsUKWWRjsoyG23E4gz0sblmhW1QsZss1Lnf9y_IXuP45fBbPA0fL4g-xgd5PcF8SVprZYVXJHd9Gs1L5fXPsI_qNSm4w |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC24-W%3A+Workshops+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Experiences+in+Managing+High-performance+Computing+Management+and+Support+Tools+while+Upgrading+a+Campus+Cluster&rft.au=Chen%2C+Yuwu&rft.au=Cooper%2C+Trevor&rft.au=Irving%2C+Christopher&rft.au=Tatineni%2C+Mahidhar&rft.date=2024-11-17&rft.pub=IEEE&rft.spage=607&rft.epage=612&rft_id=info:doi/10.1109%2FSCW63240.2024.00084&rft.externalDocID=10820654 |