Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine
High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes long-running, high-throughput workflows across thousands of nodes spanning multiple facilities to produce shared datasets. Later stages are...
Uloženo v:
| Vydáno v: | SC24: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 13 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
17.11.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes long-running, high-throughput workflows across thousands of nodes spanning multiple facilities to produce shared datasets. Later stages are typically written by individuals or small groups and must be refined and re-run many times for correctness. Reducing iteration times of later stages is key to accelerating discovery. We demonstrate our experience reshaping late-stage analysis applications on thousands of nodes. It is not enough merely to increase scale: it is necessary to make changes throughout the stack, including storage systems, data management, task scheduling, and application design. We demonstrate these changes when applied to two analysis applications built on open source data analysis frameworks (Coffea, Dask, TaskVine). We evaluate the performance of the applications on opportunistic campus clusters, showing effective scaling up to 7200 cores, thus producing significant speedup. |
|---|---|
| AbstractList | High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes long-running, high-throughput workflows across thousands of nodes spanning multiple facilities to produce shared datasets. Later stages are typically written by individuals or small groups and must be refined and re-run many times for correctness. Reducing iteration times of later stages is key to accelerating discovery. We demonstrate our experience reshaping late-stage analysis applications on thousands of nodes. It is not enough merely to increase scale: it is necessary to make changes throughout the stack, including storage systems, data management, task scheduling, and application design. We demonstrate these changes when applied to two analysis applications built on open source data analysis frameworks (Coffea, Dask, TaskVine). We evaluate the performance of the applications on opportunistic campus clusters, showing effective scaling up to 7200 cores, thus producing significant speedup. |
| Author | Zhou, Jin Sly-Delgado, Barry Tovar, Ben Thain, Douglas |
| Author_xml | – sequence: 1 givenname: Barry surname: Sly-Delgado fullname: Sly-Delgado, Barry email: bslydelg@nd.edu organization: University of Notre Dame,South Bend,U.S.A – sequence: 2 givenname: Ben surname: Tovar fullname: Tovar, Ben email: btovar@nd.edu organization: University of Notre Dame,South Bend,U.S.A – sequence: 3 givenname: Jin surname: Zhou fullname: Zhou, Jin email: jzhou24@nd.edu organization: University of Notre Dame,South Bend,U.S.A – sequence: 4 givenname: Douglas surname: Thain fullname: Thain, Douglas email: dthain@nd.edu organization: University of Notre Dame,South Bend,U.S.A |
| BookMark | eNotzM1KAzEUQOEICmrtC4iLvMDUm79Jsiyl2kJR0VbcldvMnTZYM8NklPbtpejqLA581-w8NYkYuxUwEgL8_dtECw3lSILUIwAo3RkbeuudMqCM9MJesmHOcQPGWmUVqCv28Up5h21MWz6L2x2fJuq2R_6yO-YYMh-37T4G7GOTMq-bjj8RdsU89dRh6OMP8emBwvfp81U-KUvMn-8x0Q27qHGfafjfAVs9TJeTWbF4fpxPxosCpdF9gcpV3tVBQW03oiLvwQVXVyWWINB5GUotsa7QEBkE4yqnS1cBhABBeq8G7O7PjUS0brv4hd1xLcB6JbRUv41EU6I |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SC41406.2024.00068 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350352917 |
| EndPage | 13 |
| ExternalDocumentID | 10793142 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a254t-a38d98fc30f7b1de9908c8fd6a601a892c642afda5ee5a058d8468d00cc0c2993 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001414891300024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 01 06:01:57 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a254t-a38d98fc30f7b1de9908c8fd6a601a892c642afda5ee5a058d8468d00cc0c2993 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_10793142 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Nov.-17 |
| PublicationDateYYYYMMDD | 2024-11-17 |
| PublicationDate_xml | – month: 11 year: 2024 text: 2024-Nov.-17 day: 17 |
| PublicationDecade | 2020 |
| PublicationTitle | SC24: International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib057737303 |
| Score | 1.8896517 |
| Snippet | High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Data analysis Data Transfer Hardware High energy physics High performance computing Libraries Optimization Parallel Programming Peer-to-peer computing Physics Computing Python Schedules Scientific Computing Stress |
| Title | Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine |
| URI | https://ieeexplore.ieee.org/document/10793142 |
| WOSCitedRecordID | wos001414891300024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aPHhSseKbHLyuZpvdTXKU0uJBStFaeivJZBZEaKUv_PnOpK324sFb2Etg8u08MvN9EeIOVZ2X0fAJEIILW8TM-aKV5QHqKpBDrJKW3vDZ9Hp2NHL9DVk9cWEQMQ2f4T0vUy8_TmHJV2X0hxOa8oI87r4x1ZqstQVPaYwmtOotMUa5h9d2QeUDzyG0WCJbsZzqzhMqKYJ0j_6597Fo_nLxZP8nypyIPZycitELd2qY6iR5UEN2EoNPpnFOmMvHna60pKxU9gjPWbr888m_yc4XQoKcTDMDcuDnH0NKOJvirdsZtJ-yzRsJmafSbpF5baOzNWhVm5BHpOBiwdax8lRpeetaQAWGr6MvEUuvShsp4bBRKQAFFIr0mWhMphM8F7IMKg-sX8f0Ug06AIZQkPtz5Eyd8heiyWYZf65lMMZbi1z-8f1KHLLlmbiXm2vRWMyWeCMOYLV4n89u0-F9A9Btm4w |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5SBT2pWPFtDl5Xs5ukmxylbKlYl6Jr6a3ktSBCK32IP9-ZtNVePHgLewlkvp1HZr4vhNwEVqfS52gBQLBQwifaiCxJratbFhxiK2rpDXp5WarhUPdXZPXIhQkhxOGzcIvL2Mv3E7fAqzL4wwFNqQCPuy2FyNiSrrWGj8xzDnjla2oM03cvbQEFBE4iZCiSzVBQdeMRlRhDOvv_3P2ANH_ZeLT_E2cOyVYYH5HhM_ZqkOxEcVSDFpHDR-NAp5vR-42-NIW8lJaA6CRe_5no4WjxFVwEHY1TA7Qys_cBpJxN8topqnY3Wb2SkBgo7uaJ4cprVTvO6tymPkB4UU7VvmWg1jJKZw5KDFN7I0OQhknlIeVQnjHnmINgxI9JYzwZhxNCpWWpRQU7JJhyx60L1gpwgBrcqWbmlDTxWEYfSyGM0fpEzv74fk12u9VTb9R7KB_PyR5aAWl8aX5BGvPpIlySHfc5f5tNr6IhvwEWfZ7T |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC24%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Reshaping+High+Energy+Physics+Applications+for+Near-Interactive+Execution+Using+TaskVine&rft.au=Sly-Delgado%2C+Barry&rft.au=Tovar%2C+Ben&rft.au=Zhou%2C+Jin&rft.au=Thain%2C+Douglas&rft.date=2024-11-17&rft.pub=IEEE&rft.spage=1&rft.epage=13&rft_id=info:doi/10.1109%2FSC41406.2024.00068&rft.externalDocID=10793142 |