Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine

High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes long-running, high-throughput workflows across thousands of nodes spanning multiple facilities to produce shared datasets. Later stages are...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:SC24: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 13
Hlavní autoři: Sly-Delgado, Barry, Tovar, Ben, Zhou, Jin, Thain, Douglas
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 17.11.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes long-running, high-throughput workflows across thousands of nodes spanning multiple facilities to produce shared datasets. Later stages are typically written by individuals or small groups and must be refined and re-run many times for correctness. Reducing iteration times of later stages is key to accelerating discovery. We demonstrate our experience reshaping late-stage analysis applications on thousands of nodes. It is not enough merely to increase scale: it is necessary to make changes throughout the stack, including storage systems, data management, task scheduling, and application design. We demonstrate these changes when applied to two analysis applications built on open source data analysis frameworks (Coffea, Dask, TaskVine). We evaluate the performance of the applications on opportunistic campus clusters, showing effective scaling up to 7200 cores, thus producing significant speedup.
AbstractList High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes long-running, high-throughput workflows across thousands of nodes spanning multiple facilities to produce shared datasets. Later stages are typically written by individuals or small groups and must be refined and re-run many times for correctness. Reducing iteration times of later stages is key to accelerating discovery. We demonstrate our experience reshaping late-stage analysis applications on thousands of nodes. It is not enough merely to increase scale: it is necessary to make changes throughout the stack, including storage systems, data management, task scheduling, and application design. We demonstrate these changes when applied to two analysis applications built on open source data analysis frameworks (Coffea, Dask, TaskVine). We evaluate the performance of the applications on opportunistic campus clusters, showing effective scaling up to 7200 cores, thus producing significant speedup.
Author Zhou, Jin
Sly-Delgado, Barry
Tovar, Ben
Thain, Douglas
Author_xml – sequence: 1
  givenname: Barry
  surname: Sly-Delgado
  fullname: Sly-Delgado, Barry
  email: bslydelg@nd.edu
  organization: University of Notre Dame,South Bend,U.S.A
– sequence: 2
  givenname: Ben
  surname: Tovar
  fullname: Tovar, Ben
  email: btovar@nd.edu
  organization: University of Notre Dame,South Bend,U.S.A
– sequence: 3
  givenname: Jin
  surname: Zhou
  fullname: Zhou, Jin
  email: jzhou24@nd.edu
  organization: University of Notre Dame,South Bend,U.S.A
– sequence: 4
  givenname: Douglas
  surname: Thain
  fullname: Thain, Douglas
  email: dthain@nd.edu
  organization: University of Notre Dame,South Bend,U.S.A
BookMark eNotzM1KAzEUQOEICmrtC4iLvMDUm79Jsiyl2kJR0VbcldvMnTZYM8NklPbtpejqLA581-w8NYkYuxUwEgL8_dtECw3lSILUIwAo3RkbeuudMqCM9MJesmHOcQPGWmUVqCv28Up5h21MWz6L2x2fJuq2R_6yO-YYMh-37T4G7GOTMq-bjj8RdsU89dRh6OMP8emBwvfp81U-KUvMn-8x0Q27qHGfafjfAVs9TJeTWbF4fpxPxosCpdF9gcpV3tVBQW03oiLvwQVXVyWWINB5GUotsa7QEBkE4yqnS1cBhABBeq8G7O7PjUS0brv4hd1xLcB6JbRUv41EU6I
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SC41406.2024.00068
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350352917
EndPage 13
ExternalDocumentID 10793142
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a254t-a38d98fc30f7b1de9908c8fd6a601a892c642afda5ee5a058d8468d00cc0c2993
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001414891300024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jan 01 06:01:57 EST 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a254t-a38d98fc30f7b1de9908c8fd6a601a892c642afda5ee5a058d8468d00cc0c2993
PageCount 13
ParticipantIDs ieee_primary_10793142
PublicationCentury 2000
PublicationDate 2024-Nov.-17
PublicationDateYYYYMMDD 2024-11-17
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-Nov.-17
  day: 17
PublicationDecade 2020
PublicationTitle SC24: International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib057737303
Score 1.8896517
Snippet High energy physics experiments produce petabytes of data annually that must be reduced to gain insight into the laws of nature. Early-stage reduction executes...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Data analysis
Data Transfer
Hardware
High energy physics
High performance computing
Libraries
Optimization
Parallel Programming
Peer-to-peer computing
Physics Computing
Python
Schedules
Scientific Computing
Stress
Title Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine
URI https://ieeexplore.ieee.org/document/10793142
WOSCitedRecordID wos001414891300024&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aPHhSseKbHLyuZpvdTXKU0uJBStFaeivJZBZEaKUv_PnOpK324sFb2Etg8u08MvN9EeIOVZ2X0fAJEIILW8TM-aKV5QHqKpBDrJKW3vDZ9Hp2NHL9DVk9cWEQMQ2f4T0vUy8_TmHJV2X0hxOa8oI87r4x1ZqstQVPaYwmtOotMUa5h9d2QeUDzyG0WCJbsZzqzhMqKYJ0j_6597Fo_nLxZP8nypyIPZycitELd2qY6iR5UEN2EoNPpnFOmMvHna60pKxU9gjPWbr888m_yc4XQoKcTDMDcuDnH0NKOJvirdsZtJ-yzRsJmafSbpF5baOzNWhVm5BHpOBiwdax8lRpeetaQAWGr6MvEUuvShsp4bBRKQAFFIr0mWhMphM8F7IMKg-sX8f0Ug06AIZQkPtz5Eyd8heiyWYZf65lMMZbi1z-8f1KHLLlmbiXm2vRWMyWeCMOYLV4n89u0-F9A9Btm4w
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5SBT2pWPFtDl5Xs5ukmxylbKlYl6Jr6a3ktSBCK32IP9-ZtNVePHgLewlkvp1HZr4vhNwEVqfS52gBQLBQwifaiCxJratbFhxiK2rpDXp5WarhUPdXZPXIhQkhxOGzcIvL2Mv3E7fAqzL4wwFNqQCPuy2FyNiSrrWGj8xzDnjla2oM03cvbQEFBE4iZCiSzVBQdeMRlRhDOvv_3P2ANH_ZeLT_E2cOyVYYH5HhM_ZqkOxEcVSDFpHDR-NAp5vR-42-NIW8lJaA6CRe_5no4WjxFVwEHY1TA7Qys_cBpJxN8topqnY3Wb2SkBgo7uaJ4cprVTvO6tymPkB4UU7VvmWg1jJKZw5KDFN7I0OQhknlIeVQnjHnmINgxI9JYzwZhxNCpWWpRQU7JJhyx60L1gpwgBrcqWbmlDTxWEYfSyGM0fpEzv74fk12u9VTb9R7KB_PyR5aAWl8aX5BGvPpIlySHfc5f5tNr6IhvwEWfZ7T
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC24%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Reshaping+High+Energy+Physics+Applications+for+Near-Interactive+Execution+Using+TaskVine&rft.au=Sly-Delgado%2C+Barry&rft.au=Tovar%2C+Ben&rft.au=Zhou%2C+Jin&rft.au=Thain%2C+Douglas&rft.date=2024-11-17&rft.pub=IEEE&rft.spage=1&rft.epage=13&rft_id=info:doi/10.1109%2FSC41406.2024.00068&rft.externalDocID=10793142