Data Flow Lifecycles for Optimizing Workflow Coordination
A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecyc...
Uložené v:
| Vydané v: | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 15 |
|---|---|
| Hlavní autori: | , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
ACM
11.11.2023
|
| Predmet: | |
| ISSN: | 2167-4337 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecycles (DFL) enrich task DAGs with data objects and properties that describe data flow and how tasks interact with that flow. Lifecycles enable analysis from several important perspectives: task, data, and data flow. We describe representation, measurement, analysis, visualization, and opportunity identification for DFLs. Our measurement is both distributed and scalable, using space that is constant per data file. We use lifecycles and opportunity analysis to reason about improved task placement and reduced data movement for five scientific workflows with different characteristics. Case studies show improvements of 15 x , 1.9 x , and 10-30x. Our work is implemented in the DataLife tool. |
|---|---|
| AbstractList | A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecycles (DFL) enrich task DAGs with data objects and properties that describe data flow and how tasks interact with that flow. Lifecycles enable analysis from several important perspectives: task, data, and data flow. We describe representation, measurement, analysis, visualization, and opportunity identification for DFLs. Our measurement is both distributed and scalable, using space that is constant per data file. We use lifecycles and opportunity analysis to reason about improved task placement and reduced data movement for five scientific workflows with different characteristics. Case studies show improvements of 15 x , 1.9 x , and 10-30x. Our work is implemented in the DataLife tool. |
| Author | Kougkas, Anthony Lee, Hyungro Firoz, Jesun Tallent, Nathan R. Guo, Luanzheng Tang, Meng Sun, Xian-He |
| Author_xml | – sequence: 1 givenname: Hyungro surname: Lee fullname: Lee, Hyungro email: hyungro.lee@pnnl.gov organization: Pacific Northwest National Laboratory,Richland,Washington,USA – sequence: 2 givenname: Luanzheng surname: Guo fullname: Guo, Luanzheng email: lenny.guo@pnnl.gov organization: Pacific Northwest National Laboratory,Richland,Washington,USA – sequence: 3 givenname: Meng surname: Tang fullname: Tang, Meng email: mtang11@hawk.iit.edu organization: Illinois Institute of Technology,Chicago,Illinois,USA – sequence: 4 givenname: Jesun surname: Firoz fullname: Firoz, Jesun email: jesun.firoz@pnnl.gov organization: Pacific Northwest National Laboratory,Seattle,Washington,USA – sequence: 5 givenname: Nathan R. surname: Tallent fullname: Tallent, Nathan R. email: tallent@pnnl.gov organization: Pacific Northwest National Laboratory,Richland,Washington,USA – sequence: 6 givenname: Anthony surname: Kougkas fullname: Kougkas, Anthony email: akougkas@iit.edu organization: Illinois Institute of Technology,Chicago,Illinois,USA – sequence: 7 givenname: Xian-He surname: Sun fullname: Sun, Xian-He email: sun@iit.edu organization: Illinois Institute of Technology,Chicago,Illinois,USA |
| BookMark | eNotj81Kw0AURkdRsNas3biYF0i9d_5nKdGqEOhGcVkmyR0ZTTMlCUh9eiu6-jaHw_ku2dmQB2LsGmGFqPSt1A6tUytpwCKoE1Z4650CsIDgxSlbCDS2VFLaC1ZM0wcASAEKHSyYvw9z4Os-f_E6RWoPbU8Tj3nkm_2cduk7De_8LY-f8Repch67NIQ55eGKncfQT1T875K9rh9eqqey3jw-V3d1GYSUc6mbqMG2Eci1QQlDSKLz0ovgsPFamYCtsbbphHEqQiQjjvk2IjYxUKPkkt38eRMRbfdj2oXxsD0edRqkkT97S0jm |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3581784.3607104 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400701092 |
| EISSN | 2167-4337 |
| EndPage | 15 |
| ExternalDocumentID | 10485036 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Pacific Northwest National Laboratory funderid: 10.13039/100011661 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-a233t-5bf507cf0e8ca426e1e2d9392a81b9546a1c677bd2684f0fe629797f11bfaeb43 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:09:35 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a233t-5bf507cf0e8ca426e1e2d9392a81b9546a1c677bd2684f0fe629797f11bfaeb43 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_10485036 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Nov.-11 |
| PublicationDateYYYYMMDD | 2023-11-11 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-Nov.-11 day: 11 |
| PublicationDecade | 2020 |
| PublicationTitle | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2023 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0003204180 ssib053141430 |
| Score | 1.8653219 |
| Snippet | A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | caterpillar tree data flow lifecycles Data visualization Distributed databases distributed workflows Instruments performance analysis Processor scheduling Quality of service storage bottlenecks Throughput Volume measurement |
| Title | Data Flow Lifecycles for Optimizing Workflow Coordination |
| URI | https://ieeexplore.ieee.org/document/10485036 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVoxcBUHkW85YE1xY6d2J4LFUNVOgDqVvlxLUWCBJUUBF-PnSbAwsAWWbFk2b4PX_ucg9ClpkxnaRYsjQuScJuTRGqqEiNMCJfOO2joix-nYjaTi4Wat2D1BgsDAM3jMxjFz-Yu31V2HUtlwcK5zILL7aGeEPkGrNVtnrCXOO2oxKMbZinhVJKWzofy7CpSfQnJRyxyqkVltl96Kk04mQz-OZBdNPwB5uH5d8jZQ1tQ7qNBp8yAW0M9QOpa1xpPnqp3PC082I_49g2H_BTfBRfxXHyG3jjWyX38ZVyFE2ixKQsO0cPk5n58m7QiCYlOGauTzPiQ0llPQFodwi1QSJ0KWY8OCanKeK6pzYUwLtK6eOIhT5VQwlNqvAbD2SHql1UJRwj7eAtIc0NcOPNpayU48Io5AG2osukxGsapWL5seDCW3Syc_NF-inaiOHtE7lF6hvr1ag3naNu-1cXr6qJZvS8HtJmv |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFL2CggRTeRTxxgNrSpw4cTwXKhChdCioW-XHtVQJGlRSEHw9dpoACwNbZMWSZfs-fO1zDsC5pLFMosRZGuNhwHQaBpmkIlBcuXBprMGKvvgx54NBNh6LYQ1Wr7AwiFg9PsOu_6zu8k2hF75U5iycZYlzuauw5qWzarhWs33cbmK0IRP3jjiOQkazsCb0oSy58GRfPGPd2LOqeW22X4oqVUDpt_85lC3o_EDzyPA76GzDCs52oN1oM5DaVHdBXMpSkv5T8U7yqUX94V-_EZehknvnJJ6nn6438ZVy63_pFe4MOl0WBjvw0L8a9a6DWiYhkFEcl0GirEvqtA0x09IFXKQYGeHyHulSUpGwVFKdcq6MJ3axocU0ElxwS6myEhWL96A1K2a4D8T6e0CaqtC4U5_UOkODVsQGUSoqdHQAHT8Vk5clE8akmYXDP9rPYON6dJdP8pvB7RFseql2j-Oj9Bha5XyBJ7Cu38rp6_y0WskvO9mc-A |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis+%28Online%29&rft.atitle=Data+Flow+Lifecycles+for+Optimizing+Workflow+Coordination&rft.au=Lee%2C+Hyungro&rft.au=Guo%2C+Luanzheng&rft.au=Tang%2C+Meng&rft.au=Firoz%2C+Jesun&rft.date=2023-11-11&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3581784.3607104&rft.externalDocID=10485036 |