Data Flow Lifecycles for Optimizing Workflow Coordination

A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecyc...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 15
Hlavní autori: Lee, Hyungro, Guo, Luanzheng, Tang, Meng, Firoz, Jesun, Tallent, Nathan R., Kougkas, Anthony, Sun, Xian-He
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: ACM 11.11.2023
Predmet:
ISSN:2167-4337
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecycles (DFL) enrich task DAGs with data objects and properties that describe data flow and how tasks interact with that flow. Lifecycles enable analysis from several important perspectives: task, data, and data flow. We describe representation, measurement, analysis, visualization, and opportunity identification for DFLs. Our measurement is both distributed and scalable, using space that is constant per data file. We use lifecycles and opportunity analysis to reason about improved task placement and reduced data movement for five scientific workflows with different characteristics. Case studies show improvements of 15 x , 1.9 x , and 10-30x. Our work is implemented in the DataLife tool.
AbstractList A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecycles (DFL) enrich task DAGs with data objects and properties that describe data flow and how tasks interact with that flow. Lifecycles enable analysis from several important perspectives: task, data, and data flow. We describe representation, measurement, analysis, visualization, and opportunity identification for DFLs. Our measurement is both distributed and scalable, using space that is constant per data file. We use lifecycles and opportunity analysis to reason about improved task placement and reduced data movement for five scientific workflows with different characteristics. Case studies show improvements of 15 x , 1.9 x , and 10-30x. Our work is implemented in the DataLife tool.
Author Kougkas, Anthony
Lee, Hyungro
Firoz, Jesun
Tallent, Nathan R.
Guo, Luanzheng
Tang, Meng
Sun, Xian-He
Author_xml – sequence: 1
  givenname: Hyungro
  surname: Lee
  fullname: Lee, Hyungro
  email: hyungro.lee@pnnl.gov
  organization: Pacific Northwest National Laboratory,Richland,Washington,USA
– sequence: 2
  givenname: Luanzheng
  surname: Guo
  fullname: Guo, Luanzheng
  email: lenny.guo@pnnl.gov
  organization: Pacific Northwest National Laboratory,Richland,Washington,USA
– sequence: 3
  givenname: Meng
  surname: Tang
  fullname: Tang, Meng
  email: mtang11@hawk.iit.edu
  organization: Illinois Institute of Technology,Chicago,Illinois,USA
– sequence: 4
  givenname: Jesun
  surname: Firoz
  fullname: Firoz, Jesun
  email: jesun.firoz@pnnl.gov
  organization: Pacific Northwest National Laboratory,Seattle,Washington,USA
– sequence: 5
  givenname: Nathan R.
  surname: Tallent
  fullname: Tallent, Nathan R.
  email: tallent@pnnl.gov
  organization: Pacific Northwest National Laboratory,Richland,Washington,USA
– sequence: 6
  givenname: Anthony
  surname: Kougkas
  fullname: Kougkas, Anthony
  email: akougkas@iit.edu
  organization: Illinois Institute of Technology,Chicago,Illinois,USA
– sequence: 7
  givenname: Xian-He
  surname: Sun
  fullname: Sun, Xian-He
  email: sun@iit.edu
  organization: Illinois Institute of Technology,Chicago,Illinois,USA
BookMark eNotj81Kw0AURkdRsNas3biYF0i9d_5nKdGqEOhGcVkmyR0ZTTMlCUh9eiu6-jaHw_ku2dmQB2LsGmGFqPSt1A6tUytpwCKoE1Z4650CsIDgxSlbCDS2VFLaC1ZM0wcASAEKHSyYvw9z4Os-f_E6RWoPbU8Tj3nkm_2cduk7De_8LY-f8Repch67NIQ55eGKncfQT1T875K9rh9eqqey3jw-V3d1GYSUc6mbqMG2Eci1QQlDSKLz0ovgsPFamYCtsbbphHEqQiQjjvk2IjYxUKPkkt38eRMRbfdj2oXxsD0edRqkkT97S0jm
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3581784.3607104
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400701092
EISSN 2167-4337
EndPage 15
ExternalDocumentID 10485036
Genre orig-research
GrantInformation_xml – fundername: Pacific Northwest National Laboratory
  funderid: 10.13039/100011661
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a233t-5bf507cf0e8ca426e1e2d9392a81b9546a1c677bd2684f0fe629797f11bfaeb43
IEDL.DBID RIE
IngestDate Wed Aug 27 02:09:35 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a233t-5bf507cf0e8ca426e1e2d9392a81b9546a1c677bd2684f0fe629797f11bfaeb43
PageCount 15
ParticipantIDs ieee_primary_10485036
PublicationCentury 2000
PublicationDate 2023-Nov.-11
PublicationDateYYYYMMDD 2023-11-11
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-Nov.-11
  day: 11
PublicationDecade 2020
PublicationTitle International Conference for High Performance Computing, Networking, Storage and Analysis (Online)
PublicationTitleAbbrev SC
PublicationYear 2023
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003204180
ssib053141430
Score 1.8653219
Snippet A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions,...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms caterpillar tree
data flow lifecycles
Data visualization
Distributed databases
distributed workflows
Instruments
performance analysis
Processor scheduling
Quality of service
storage bottlenecks
Throughput
Volume measurement
Title Data Flow Lifecycles for Optimizing Workflow Coordination
URI https://ieeexplore.ieee.org/document/10485036
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVoxcBUHkW85YE1xY6d2J4LFUNVOgDqVvlxLUWCBJUUBF-PnSbAwsAWWbFk2b4PX_ucg9ClpkxnaRYsjQuScJuTRGqqEiNMCJfOO2joix-nYjaTi4Wat2D1BgsDAM3jMxjFz-Yu31V2HUtlwcK5zILL7aGeEPkGrNVtnrCXOO2oxKMbZinhVJKWzofy7CpSfQnJRyxyqkVltl96Kk04mQz-OZBdNPwB5uH5d8jZQ1tQ7qNBp8yAW0M9QOpa1xpPnqp3PC082I_49g2H_BTfBRfxXHyG3jjWyX38ZVyFE2ixKQsO0cPk5n58m7QiCYlOGauTzPiQ0llPQFodwi1QSJ0KWY8OCanKeK6pzYUwLtK6eOIhT5VQwlNqvAbD2SHql1UJRwj7eAtIc0NcOPNpayU48Io5AG2osukxGsapWL5seDCW3Syc_NF-inaiOHtE7lF6hvr1ag3naNu-1cXr6qJZvS8HtJmv
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFL2CggRTeRTxxgNrSpw4cTwXKhChdCioW-XHtVQJGlRSEHw9dpoACwNbZMWSZfs-fO1zDsC5pLFMosRZGuNhwHQaBpmkIlBcuXBprMGKvvgx54NBNh6LYQ1Wr7AwiFg9PsOu_6zu8k2hF75U5iycZYlzuauw5qWzarhWs33cbmK0IRP3jjiOQkazsCb0oSy58GRfPGPd2LOqeW22X4oqVUDpt_85lC3o_EDzyPA76GzDCs52oN1oM5DaVHdBXMpSkv5T8U7yqUX94V-_EZehknvnJJ6nn6438ZVy63_pFe4MOl0WBjvw0L8a9a6DWiYhkFEcl0GirEvqtA0x09IFXKQYGeHyHulSUpGwVFKdcq6MJ3axocU0ElxwS6myEhWL96A1K2a4D8T6e0CaqtC4U5_UOkODVsQGUSoqdHQAHT8Vk5clE8akmYXDP9rPYON6dJdP8pvB7RFseql2j-Oj9Bha5XyBJ7Cu38rp6_y0WskvO9mc-A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis+%28Online%29&rft.atitle=Data+Flow+Lifecycles+for+Optimizing+Workflow+Coordination&rft.au=Lee%2C+Hyungro&rft.au=Guo%2C+Luanzheng&rft.au=Tang%2C+Meng&rft.au=Firoz%2C+Jesun&rft.date=2023-11-11&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3581784.3607104&rft.externalDocID=10485036