Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Ročník 2023; s. 868 - 877
Hlavní autori: Collier, Nicholson, Wozniak, Justin M., Stevens, Abby, Babuji, Yadu, Binois, Mickael, Fadikar, Arindam, Wurth, Alexandra, Chard, Kyle, Ozik, Jonathan
Médium: Konferenčný príspevok.. Journal Article
Jazyk:English
Vydavateľské údaje: United States IEEE 01.05.2023
Predmet:
ISSN:2164-7062
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2164-7062
DOI:10.1109/IPDPSW59300.2023.00143