CAMP: A modular metagenomics analysis system for integrated multi-step data exploration.

Uloženo v:
Podrobná bibliografie
Název: CAMP: A modular metagenomics analysis system for integrated multi-step data exploration.
Autoři: Mak L; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, USA.; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA., Tierney B; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY, USA., Wei W; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA., Ronkowski C; Titus Family Department of Clinical Pharmacy, University of Southern California, CA, USA., Toscan RB; Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland., Turhan B; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, USA.; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA., Toomey M; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, USA.; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA., Martinez JSA; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, USA.; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA., Fu C; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, USA.; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA., Lucaci AG; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY, USA., Solano AHB; Department of Biochemistry, Institute of Chemistry, Universidade de São Paulo, São Paulo, Brazil., Setubal JC; Department of Biochemistry, Institute of Chemistry, Universidade de São Paulo, São Paulo, Brazil., Henriksen JR; Natural Resource Ecology Laboratory, Colorado State University, CO, USA.; Two Frontiers Project, State, Country., Zimmerman S; Broad Institute of MIT and Harvard, MA, USA., Kopbayeva M; Mathematics Institute, University of Warwick, Coventry, UK, Nazarbayev Intellectual School of Physics and Math, Almaty, Kazakhstan., Noyvert A; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, USA.; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY, USA.; Titus Family Department of Clinical Pharmacy, University of Southern California, CA, USA.; Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland.; Department of Biochemistry, Institute of Chemistry, Universidade de São Paulo, São Paulo, Brazil.; Natural Resource Ecology Laboratory, Colorado State University, CO, USA.; Two Frontiers Project, State, Country.; Broad Institute of MIT and Harvard, MA, USA.; Mathematics Institute, University of Warwick, Coventry, UK, Nazarbayev Intellectual School of Physics and Math, Almaty, Kazakhstan.; School of Molecular and Theoretical Biology, Tartu, Estonia.; Institute of Molecular Biology and Genetics, NASU, Kyiv, Ukraine.; Kyiv Academic University, Kyiv, Ukraine.; Taras Shevchenko National University, Kyiv, Ukraine.; ETH, Zurich, Switzerland.; Biotia, NY, USA.; GeoSeeq Foundation, NY, USA.; Department of Biology, Lund University, Sweden.; WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine of Cornell University, NY, USA.; The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine of Cornell University, NY, USA and Englander Institute for Precision Medicine, Weill Cornell Medicine of Cornell University, NY, USA., Iwan Z; School of Molecular and Theoretical Biology, Tartu, Estonia., Kar S; School of Molecular and Theoretical Biology, Tartu, Estonia., Nakazawa N; School of Molecular and Theoretical Biology, Tartu, Estonia., Meleshko D; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, USA.; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA., Horyslavets D; Institute of Molecular Biology and Genetics, NASU, Kyiv, Ukraine.; Kyiv Academic University, Kyiv, Ukraine., Kantsypa V; Institute of Molecular Biology and Genetics, NASU, Kyiv, Ukraine.; Taras Shevchenko National University, Kyiv, Ukraine., Frolova A; Institute of Molecular Biology and Genetics, NASU, Kyiv, Ukraine.; Kyiv Academic University, Kyiv, Ukraine., Kahles A; ETH, Zurich, Switzerland., Danko D; Biotia, NY, USA.; GeoSeeq Foundation, NY, USA., Elhaik E; Department of Biology, Lund University, Sweden., Labaj P; Department of Biochemistry, Institute of Chemistry, Universidade de São Paulo, São Paulo, Brazil., Mangul S; Titus Family Department of Clinical Pharmacy, University of Southern California, CA, USA.; Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland., Mason CE; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY, USA.; Biotia, NY, USA.; WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine of Cornell University, NY, USA.; The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine of Cornell University, NY, USA and Englander Institute for Precision Medicine, Weill Cornell Medicine of Cornell University, NY, USA., Hajirasouliha I; Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, NY, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY, USA.
Korporace: International MetaSUB Consortium
Zdroj: BioRxiv : the preprint server for biology [bioRxiv] 2025 Apr 21. Date of Electronic Publication: 2025 Apr 21.
Způsob vydávání: Journal Article; Preprint
Jazyk: English
Informace o časopise: Country of Publication: United States NLM ID: 101680187 Publication Model: Electronic Cited Medium: Internet ISSN: 2692-8205 (Electronic) Linking ISSN: 26928205 NLM ISO Abbreviation: bioRxiv Subsets: PubMed not MEDLINE
Abstrakt: Motivation: Computational analysis of large-scale metagenomics sequencing datasets have proven to be both incredibly valuable for extracting isolate-level taxonomic, and functional insights from complex microbial communities. However, due to an ever-expanding ecosystem of metagenomics-specific methods and file-formats, designing seamless and scalable end-to-end workflows, and exploring the massive amounts of output data have become studies unto themselves. One-click bioinformatics pipelines have helped to organize these tools into targeted workflows, but they suffer from general compatibility and maintainability issues, and preclude replication.
Methods: To address the gap in easily extensible yet robustly distributable metagenomics workflows, we have developed a module-based metagenomics analysis system "Core Analysis Modular Pipeline" (CAMP), written in Snakemake, a popular workflow management system, along with a standardized module and working directory architecture. Each module can be run independently or conjointly with a series of others to produce the target data format (e.g. short-read preprocessing alone, or short-read preprocessing followed by de novo assembly), and outputs aggregated summary statistics reports and semi-guided Jupyter notebook-based visualizations.
Results: We have applied CAMP to a set of ten metagenomics samples to demonstrate how a modular analysis system with built-in data visualization at intermediate steps facilitates rich and seamless inter-communication between output data from different analytic purposes.
Availability: The CAMP ecosystem (module template and analysis modules) can be found https://github.com/Meta-CAMP.
Competing Interests: Competing Interests No competing interests are declared.
References: BMC Genomics. 2013;14 Suppl 1:S7. (PMID: 23368723)
PeerJ. 2019 Jul 26;7:e7359. (PMID: 31388474)
Nat Biotechnol. 2020 Mar;38(3):276-278. (PMID: 32055031)
Nucleic Acids Res. 2020 Dec 16;48(22):12523-12533. (PMID: 33270886)
PeerJ Comput Sci. 2017;3:. (PMID: 40271438)
Nat Methods. 2023 Aug;20(8):1203-1212. (PMID: 37500759)
Microbiome. 2021 Feb 1;9(1):37. (PMID: 33522966)
Nat Microbiol. 2016 Apr 11;1:16048. (PMID: 27572647)
Nat Biotechnol. 2021 May;39(5):555-560. (PMID: 33398153)
Nature. 2012 Jun 13;486(7402):207-14. (PMID: 22699609)
Bioinformatics. 2016 Apr 1;32(7):1009-15. (PMID: 26589280)
Sci Rep. 2020 Jul 1;10(1):10689. (PMID: 32612216)
Nat Protoc. 2021 May;16(5):2520-2541. (PMID: 33864056)
Microbiome. 2020 Jun 10;8(1):90. (PMID: 32522236)
Genome Res. 2020 Mar;30(3):315-333. (PMID: 32188701)
Front Microbiol. 2016 Sep 08;7:1352. (PMID: 27660623)
Nat Microbiol. 2018 Jul;3(7):836-843. (PMID: 29807988)
Nat Methods. 2017 Nov;14(11):1063-1071. (PMID: 28967888)
PLoS Comput Biol. 2021 Feb 9;17(2):e1008716. (PMID: 33561126)
Bioinformatics. 2020 Aug 15;36(14):4126-4129. (PMID: 32413137)
Bioinformatics. 2015 May 15;31(10):1674-6. (PMID: 25609793)
Curr Protoc. 2021 Aug;1(8):e218. (PMID: 34387940)
Bioinformatics. 2016 Feb 15;32(4):605-7. (PMID: 26515820)
Nat Biotechnol. 2021 May;39(5):578-585. (PMID: 33349699)
Nat Microbiol. 2019 Jun;4(6):964-971. (PMID: 30911128)
Environ Microbiol. 2021 Jan;23(1):316-326. (PMID: 33185929)
PLoS Comput Biol. 2019 Apr 8;15(4):e1006967. (PMID: 30958827)
Bioinformatics. 2023 May 4;39(5):. (PMID: 37171891)
Nat Methods. 2014 Nov;11(11):1144-6. (PMID: 25218180)
ISME J. 2017 Dec;11(12):2864-2868. (PMID: 28742071)
Microb Genom. 2021 Nov;7(11):. (PMID: 34739369)
Bioinformatics. 2018 Sep 1;34(17):i884-i890. (PMID: 30423086)
Environ Microbiome. 2022 Dec 21;17(1):60. (PMID: 36544228)
Viruses. 2023 Apr 19;15(4):. (PMID: 37112988)
Nat Microbiol. 2017 Nov;2(11):1533-1542. (PMID: 28894102)
Microbiome. 2018 Dec 17;6(1):226. (PMID: 30558668)
Genome Biol. 2023 Jan 6;24(1):1. (PMID: 36609515)
Comput Struct Biotechnol J. 2023 Mar 07;21:2075-2085. (PMID: 36968012)
Nat Biotechnol. 2020 Jun;38(6):701-707. (PMID: 32042169)
Nature. 2020 Oct;586(7827):133-138. (PMID: 32728212)
BMC Bioinformatics. 2020 Jun 22;21(1):257. (PMID: 32571209)
PLoS Comput Biol. 2021 Jun 18;17(6):e1009089. (PMID: 34143768)
Nat Biotechnol. 2023 Nov;41(11):1633-1644. (PMID: 36823356)
Genome Res. 2017 May;27(5):824-834. (PMID: 28298430)
Bioinformatics. 2016 Oct 1;32(19):3047-8. (PMID: 27312411)
J Mol Biol. 2023 Jul 15;435(14):168016. (PMID: 36806692)
PLoS Biol. 2012;10(8):e1001377. (PMID: 22904687)
Nat Commun. 2021 Jul 23;12(1):4485. (PMID: 34301928)
Nat Methods. 2012 Mar 04;9(4):357-9. (PMID: 22388286)
Quant Biol. 2020 Mar;8(1):64-77. (PMID: 34084563)
Nucleic Acids Res. 2010 Jul;38(12):e131. (PMID: 20395217)
FEMS Microbiol Ecol. 2003 Feb 1;43(1):1-11. (PMID: 19719691)
Nat Biotechnol. 2017 Sep 12;35(9):833-844. (PMID: 28898207)
Nat Biotechnol. 2017 Apr 11;35(4):316-319. (PMID: 28398311)
Bioinformatics. 2013 Apr 15;29(8):1072-5. (PMID: 23422339)
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944. (PMID: 29373581)
Nat Methods. 2015 Jan;12(1):59-60. (PMID: 25402007)
Nature. 2019 Apr;568(7753):499-504. (PMID: 30745586)
Genome Res. 2015 Jul;25(7):1043-55. (PMID: 25977477)
Nat Biotechnol. 2024 Aug;42(8):1303-1312. (PMID: 37735266)
F1000Res. 2021 Jan 18;10:33. (PMID: 34035898)
Appl Environ Microbiol. 2024 Mar 20;90(3):e0226423. (PMID: 38372512)
Bioinformatics. 2009 Aug 15;25(16):2078-9. (PMID: 19505943)
Genome Biol. 2021 Jun 13;22(1):178. (PMID: 34120611)
Cell. 2021 Jun 24;184(13):3376-3393.e17. (PMID: 34043940)
Bioinformatics. 2016 May 1;32(9):1323-30. (PMID: 26743509)
Bioinformatics. 2018 Sep 15;34(18):3094-3100. (PMID: 29750242)
Bioinformatics. 2019 Nov 15;:. (PMID: 31730192)
Genome Biol. 2019 Nov 28;20(1):257. (PMID: 31779668)
Nat Methods. 2020 Nov;17(11):1103-1110. (PMID: 33020656)
Genome Biol. 2019 Feb 27;20(1):47. (PMID: 30813962)
NAR Genom Bioinform. 2022 Feb 02;4(1):lqac007. (PMID: 35118380)
Bioinformatics. 2014 Jul 15;30(14):2068-9. (PMID: 24642063)
Microbiol Mol Biol Rev. 2017 Aug 9;81(3):. (PMID: 28794225)
Nat Biotechnol. 2017 Aug 8;35(8):725-731. (PMID: 28787424)
Nat Commun. 2022 Apr 28;13(1):2326. (PMID: 35484115)
Microb Genom. 2024 Jun;10(6):. (PMID: 38833287)
Grant Information: R35 GM138152 United States GM NIGMS NIH HHS; T32 GM083937 United States GM NIGMS NIH HHS
Contributed Indexing: Keywords: De Novo Assembly; Gene Cataloguing; Metagenome-Assembled Genomes; Metagenomics; Metaviromics; Taxonomic Classification; Workflow Management Systems
Entry Date(s): Date Created: 20230417 Latest Revision: 20250723
Update Code: 20250723
PubMed Central ID: PMC10104186
DOI: 10.1101/2023.04.09.536171
PMID: 37066359
Databáze: MEDLINE
Popis
Abstrakt:Motivation: Computational analysis of large-scale metagenomics sequencing datasets have proven to be both incredibly valuable for extracting isolate-level taxonomic, and functional insights from complex microbial communities. However, due to an ever-expanding ecosystem of metagenomics-specific methods and file-formats, designing seamless and scalable end-to-end workflows, and exploring the massive amounts of output data have become studies unto themselves. One-click bioinformatics pipelines have helped to organize these tools into targeted workflows, but they suffer from general compatibility and maintainability issues, and preclude replication.<br />Methods: To address the gap in easily extensible yet robustly distributable metagenomics workflows, we have developed a module-based metagenomics analysis system "Core Analysis Modular Pipeline" (CAMP), written in Snakemake, a popular workflow management system, along with a standardized module and working directory architecture. Each module can be run independently or conjointly with a series of others to produce the target data format (e.g. short-read preprocessing alone, or short-read preprocessing followed by de novo assembly), and outputs aggregated summary statistics reports and semi-guided Jupyter notebook-based visualizations.<br />Results: We have applied CAMP to a set of ten metagenomics samples to demonstrate how a modular analysis system with built-in data visualization at intermediate steps facilitates rich and seamless inter-communication between output data from different analytic purposes.<br />Availability: The CAMP ecosystem (module template and analysis modules) can be found https://github.com/Meta-CAMP.
ISSN:2692-8205
DOI:10.1101/2023.04.09.536171