Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems

Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computational and structural biotechnology journal Ročník 21; s. 2075 - 2085
Hlavní autoři: Djaffardjy, Marine, Marchment, George, Sebe, Clémence, Blanchet, Raphaël, Belhajjame, Khalid, Gaignard, Alban, Lemoine, Frédéric, Cohen-Boulakia, Sarah
Médium: Journal Article
Jazyk:angličtina
Vydáno: Netherlands Elsevier B.V 01.01.2023
Elsevier
Research Network of Computational and Structural Biotechnology
Témata:
ISSN:2001-0370, 2001-0370
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows. [Display omitted] •Elicitation of the problems faced when designing large-scale bioinformatics pipelines.•Review of existing solutions for developing reuseable bioinformatics pipelines.•Quantitative and qualitative study on current reuse of bioinformatics workflow systems.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
ObjectType-Review-3
content type line 23
PMCID: PMC10030817
ISSN:2001-0370
2001-0370
DOI:10.1016/j.csbj.2023.03.003