Data Flow Lifecycles for Optimizing Workflow Coordination

A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecyc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International Conference for High Performance Computing, Networking, Storage and Analysis (Online) S. 1 - 15
Hauptverfasser: Lee, Hyungro, Guo, Luanzheng, Tang, Meng, Firoz, Jesun, Tallent, Nathan R., Kougkas, Anthony, Sun, Xian-He
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 11.11.2023
Schlagworte:
ISSN:2167-4337
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A critical performance challenge in distributed scientific workflows is coordinating tasks and data flows on distributed resources. To guide these decisions, this paper introduces data flow lifecycle analysis. Workflows are commonly represented using directed acyclic graphs (DAGs). Data flow lifecycles (DFL) enrich task DAGs with data objects and properties that describe data flow and how tasks interact with that flow. Lifecycles enable analysis from several important perspectives: task, data, and data flow. We describe representation, measurement, analysis, visualization, and opportunity identification for DFLs. Our measurement is both distributed and scalable, using space that is constant per data file. We use lifecycles and opportunity analysis to reason about improved task placement and reduced data movement for five scientific workflows with different characteristics. Case studies show improvements of 15 x , 1.9 x , and 10-30x. Our work is implemented in the DataLife tool.
ISSN:2167-4337
DOI:10.1145/3581784.3607104