Toward Long-Term and Archivable Reproducibility

Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: completeness (no execution requirement beyond a minimal Unix-like operating sy...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computing in science & engineering Ročník 23; číslo 3; s. 82 - 91
Hlavní autoři: Akhlaghi, Mohammad, Infante-Sainz, Raul, Roukema, Boudewijn F., Khellat, Mohammadreza, Valls-Gabaud, David, Baena-Galle, Roberto, A. Barba, Lorena, Gesing, Sandra
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.05.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Témata:
ISSN:1521-9615, 1558-366X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free and open-source software. As a proof of concept, we introduce “Maneage” (managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that has been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This article is itself a Maneage'd project (project commit 313db0b). Appendices—Two comprehensive appendices that review the longevity of existing solutions are available as supplementary “Web extras,” which are available in the IEEE Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/MCSE.2021.3072860. Reproducibility—All products available in zenodo.4913277, the Git history of this paper's source is at git.maneage.org/paper-concept.git, which is also archived in Software Heritage: swh:1:dir:33fea87068c1612daf011f161b97787b9a0df39fk. Clicking on the SWHIDs in the digital format will provide more “context” for same content.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1521-9615
1558-366X
DOI:10.1109/MCSE.2021.3072860