STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions
Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k- mer-based tool for fast assessment of taxonomic diversity intrinsic...
Gespeichert in:
| Veröffentlicht in: | Genome Biology Jg. 22; H. 1; S. 270 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
London
BioMed Central
20.09.2021
Springer Nature B.V BMC |
| Schlagworte: | |
| ISSN: | 1474-760X, 1474-7596, 1474-760X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable
k-
mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based
k-
mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms. |
|---|---|
| AbstractList | Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms. Abstract Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms. Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k- mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k- mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms. Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms. |
| ArticleNumber | 270 |
| Author | Katz, Kenneth S. Kimelman, Michael Brister, J. Rodney Lapoint, Richard Shutov, Oleg O’Sullivan, Christopher |
| Author_xml | – sequence: 1 givenname: Kenneth S. orcidid: 0000-0002-9134-4559 surname: Katz fullname: Katz, Kenneth S. email: kskatz@nih.gov organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health – sequence: 2 givenname: Oleg surname: Shutov fullname: Shutov, Oleg organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health – sequence: 3 givenname: Richard surname: Lapoint fullname: Lapoint, Richard organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health – sequence: 4 givenname: Michael surname: Kimelman fullname: Kimelman, Michael organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health – sequence: 5 givenname: J. Rodney surname: Brister fullname: Brister, J. Rodney organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health – sequence: 6 givenname: Christopher surname: O’Sullivan fullname: O’Sullivan, Christopher organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34544477$$D View this record in MEDLINE/PubMed |
| BookMark | eNqFkktv1DAUhSNURB_wB1ggS2xYNGDHz7BAGlVAKxUh0UFiZzn2zUyG1G59MxX8e8xMW9ouyiKOZX_n-Pr67Fc7MUWoqpeMvmXMqHfIOJVtTRtWPtHSmj6p9pjQotaK_ti5M9-t9hFXlLJWNOpZtcuFFEJovVetz-az-XviSO9wOiTo3ei6EQ7JlyEeO1zWnUMI5Gd9DplMKY1lIA4REMkZXK4heiDfwAUyy345XAGJ8GuqFxAhu2lIkeANhevufEAsa_i8etq7EeHF9f-g-v7p4_zouD79-vnkaHZae63FVDMnTTBKM9X2XvpSv9Rto7qWa669EoG1TWiYajjruelaUNBxYyTvwQNXgR9UJ1vfkNzKXuTh3OXfNrnBbhZSXliXp8GPYHloVQDaq9D1wnXaaUOBKipM0ysuoXh92HpdlHtA8BCn7MZ7pvd34rC0i3RljZC0XKEYvLk2yKm0BCdb2uFhHF2EtEbbKK4Ub1Uj_o9KLalSWsmCvn6ArtI6x9LVQhnOWDFlhXp1t_jbqm-CUIBmC_icEDP0twij9m_a7DZttqTNbtJmaRGZByI_TJtXLw0YxselfCvFck5cQP5X9iOqPwwW5-U |
| CitedBy_id | crossref_primary_10_7717_peerj_13410 crossref_primary_10_1093_gpbjnl_qzaf072 crossref_primary_10_1093_ismeco_ycae024 crossref_primary_10_3389_fcimb_2021_759697 crossref_primary_10_1038_s41467_023_43960_2 crossref_primary_10_1038_s41586_021_04332_2 crossref_primary_10_1038_s41467_024_52598_7 crossref_primary_10_1016_j_dib_2024_110073 crossref_primary_10_1093_nar_gkae979 crossref_primary_10_1007_s11046_025_00931_z crossref_primary_10_1038_s41467_023_41174_0 crossref_primary_10_1093_ve_veae022 crossref_primary_10_1128_mra_00286_22 crossref_primary_10_7717_peerj_14055 crossref_primary_10_1094_PDIS_06_24_1265_SC crossref_primary_10_1093_nar_gkab1112 crossref_primary_10_1128_mra_00245_24 crossref_primary_10_3389_finsc_2023_1093970 crossref_primary_10_1093_biolinnean_blae028 crossref_primary_10_1142_S2737416525500176 crossref_primary_10_1128_aem_00913_25 crossref_primary_10_1038_s41467_024_47187_7 crossref_primary_10_3390_microorganisms11082096 crossref_primary_10_1016_j_envres_2022_115065 crossref_primary_10_3390_microorganisms11102612 crossref_primary_10_1002_edn3_489 crossref_primary_10_1093_femsre_fuad051 crossref_primary_10_1016_j_cell_2024_12_017 crossref_primary_10_3389_fmars_2023_1159754 crossref_primary_10_1099_mgen_0_001051 crossref_primary_10_1093_gigascience_giae010 crossref_primary_10_1128_msystems_00840_25 crossref_primary_10_1128_spectrum_03426_22 crossref_primary_10_3390_v16030430 crossref_primary_10_1093_nar_gkac298 crossref_primary_10_1186_s13059_023_03141_2 crossref_primary_10_3390_v16060856 crossref_primary_10_3390_v14091859 crossref_primary_10_1093_bib_bbad280 crossref_primary_10_7717_peerj_13821 crossref_primary_10_1128_mbio_01142_25 crossref_primary_10_3390_microorganisms11030790 crossref_primary_10_1093_ve_veae040 crossref_primary_10_1111_1348_0421_13033 crossref_primary_10_1093_nar_gkab1053 crossref_primary_10_1093_nar_gkad1044 crossref_primary_10_1038_s41592_024_02280_z crossref_primary_10_1128_mra_00253_23 crossref_primary_10_1093_nar_gkae364 |
| Cites_doi | 10.1128/mSphere.00160-20 10.1038/s41598-018-29325-6 10.5281/zenodo.5260009 10.1101/cshperspect.a023358 10.1093/nar/gku1207 10.1007/3-540-45123-4_1 10.1093/nar/gkr1079 10.1186/s13059-019-1891-0 10.1371/journal.pone.0020660 10.1186/s13059-019-1841-x 10.1038/sdata.2016.18 10.1093/nar/gkr854 10.1089/cmb.2006.13.1028 10.1016/j.sjbs.2020.04.033 10.1093/bioinformatics/btn322 10.12688/f1000research.23180.2 10.1093/bioinformatics/bth266 10.1093/nar/gkp1078 10.12688/f1000research.19675.1 10.1186/s13059-018-1568-0 10.1186/gb-2014-15-3-r46 10.1016/j.jinf.2020.02.020 10.1093/bib/bbx120 10.1093/bioinformatics/btx334 10.15252/embr.201948316 10.1186/s13059-016-0997-x 10.1038/nrmicro.2016.177 10.1038/nrmicro.2017.13 10.17169/refubium-22374 10.1126/science.1095019 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2021 2021. The Author(s). 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: The Author(s) 2021 – notice: 2021. The Author(s). – notice: 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7X7 7XB 88E 8FE 8FH 8FI 8FJ 8FK ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU COVID DWQXO FYUFA GHDGH GNUQQ HCIFZ K9. LK8 M0S M1P M7P PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS 7X8 7S9 L.6 5PM DOA |
| DOI | 10.1186/s13059-021-02490-0 |
| DatabaseName | Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Hospital Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Database ProQuest Central Natural Science Collection ProQuest One Community College Coronavirus Research Database ProQuest Central Proquest Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) Biological Sciences ProQuest Health & Medical Collection Medical Database Biological Science Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China MEDLINE - Academic AGRICOLA AGRICOLA - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Health & Medical Research Collection Health Research Premium Collection Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) ProQuest Biological Science Collection ProQuest One Academic Eastern Edition Coronavirus Research Database ProQuest Hospital Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) ProQuest Health & Medical Complete ProQuest Medical Library ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic AGRICOLA AGRICOLA - Academic |
| DatabaseTitleList | Publicly Available Content Database AGRICOLA MEDLINE - Academic CrossRef MEDLINE |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1474-760X |
| EndPage | 270 |
| ExternalDocumentID | oai_doaj_org_article_3d96de0f6dbf4ab7a780e060482f635e PMC8450716 34544477 10_1186_s13059_021_02490_0 |
| Genre | Journal Article Research Support, N.I.H., Intramural |
| GrantInformation_xml | – fundername: National Library of Medicine (NLM) – fundername: ; |
| GroupedDBID | --- 0R~ 29H 4.4 53G 5GY 5VS 7X7 88E 8FE 8FH 8FI 8FJ AAFWJ AAHBH AAJSJ AASML ABUWG ACGFO ACGFS ACJQM ACPRK ADBBV ADUKV AEGXH AFKRA AFPKN AHBYD AIAGR ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIAM AOIJS BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BHPHI BMC BPHCQ BVXVI C6C CCPQU EBD EBLON EBS EMOBN FYUFA GROUPED_DOAJ GX1 HCIFZ HMCUK IAO IGS IHR ISR ITC KPI LK8 M1P M7P PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PUEGO ROL RPM RSV SJN SOJ SV3 UKHRP AAYXX AFFHD CITATION 3V. ACRMQ ADINQ ALIPV C24 CGR CUY CVF ECM EIF NPM 7XB 8FK AZQEC COVID DWQXO GNUQQ K9. PKEHL PQEST PQUKI PRINS 7X8 7S9 L.6 5PM |
| ID | FETCH-LOGICAL-c774t-1a58d867169fc5c00157926b93737c64d192d216231f38b9e6eb38853fece36d3 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 53 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000700284000002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1474-760X 1474-7596 |
| IngestDate | Fri Oct 03 12:51:57 EDT 2025 Tue Nov 04 01:47:08 EST 2025 Sun Nov 09 14:02:16 EST 2025 Fri Sep 05 12:01:20 EDT 2025 Tue Oct 14 14:11:26 EDT 2025 Thu Jan 02 22:39:54 EST 2025 Tue Nov 18 21:30:03 EST 2025 Sat Nov 29 04:55:57 EST 2025 Sat Sep 06 07:17:33 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | MinHash Metagenomics |
| Language | English |
| License | 2021. The Author(s). Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c774t-1a58d867169fc5c00157926b93737c64d192d216231f38b9e6eb38853fece36d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-9134-4559 |
| OpenAccessLink | https://doaj.org/article/3d96de0f6dbf4ab7a780e060482f635e |
| PMID | 34544477 |
| PQID | 2583113661 |
| PQPubID | 2040232 |
| PageCount | 1 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_3d96de0f6dbf4ab7a780e060482f635e pubmedcentral_primary_oai_pubmedcentral_nih_gov_8450716 proquest_miscellaneous_2636639624 proquest_miscellaneous_2575066765 proquest_journals_2583113661 pubmed_primary_34544477 crossref_primary_10_1186_s13059_021_02490_0 crossref_citationtrail_10_1186_s13059_021_02490_0 springer_journals_10_1186_s13059_021_02490_0 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-09-20 |
| PublicationDateYYYYMMDD | 2021-09-20 |
| PublicationDate_xml | – month: 09 year: 2021 text: 2021-09-20 day: 20 |
| PublicationDecade | 2020 |
| PublicationPlace | London |
| PublicationPlace_xml | – name: London – name: England |
| PublicationTitle | Genome Biology |
| PublicationTitleAbbrev | Genome Biol |
| PublicationTitleAlternate | Genome Biol |
| PublicationYear | 2021 |
| Publisher | BioMed Central Springer Nature B.V BMC |
| Publisher_xml | – name: BioMed Central – name: Springer Nature B.V – name: BMC |
| References | F Pfeiffer (2490_CR19) 2018; 8 BD Ondov (2490_CR11) 2019; 20 AA Al-Qahtani (2490_CR15) 2020; 27 A Chakravarti (2490_CR20) 2015; 7 JC Castle (2490_CR21) 2011; 6 A Morgulis (2490_CR38) 2008; 24 MN Bernstein (2490_CR22) 2017; 33 2490_CR6 2490_CR31 2490_CR5 Z Lin (2490_CR18) 2004; 305 2490_CR33 2490_CR35 2490_CR9 2490_CR34 MD Wilkinson (2490_CR25) 2016; 3 2490_CR37 2490_CR39 M Shumway (2490_CR1) 2010; 38 NT Pierce (2490_CR12) 2019; 8 FP Breitwieser (2490_CR10) 2019; 20 KD Pruitt (2490_CR32) 2012; 40 JR Brister (2490_CR7) 2015; 43 2490_CR4 Y Fofanov (2490_CR29) 2004; 20 DE Wood (2490_CR13) 2014; 15 MN Bernstein (2490_CR23) 2020; 9 DE Wood (2490_CR14) 2019; 20 2490_CR40 Y Kodama (2490_CR2) 2012; 40 2490_CR41 FP Breitwieser (2490_CR30) 2018; 19 P Simmonds (2490_CR8) 2017; 15 2490_CR24 L Wahba (2490_CR36) 2020; 5 AZ Broder (2490_CR3) 2000 2490_CR26 PJ Lillie (2490_CR16) 2020; 80 M Shabani (2490_CR17) 2019; 20 2490_CR28 2490_CR27 |
| References_xml | – volume: 5 start-page: e00160 issue: 3 year: 2020 ident: 2490_CR36 publication-title: mSphere doi: 10.1128/mSphere.00160-20 – volume: 8 start-page: 10950 issue: 1 year: 2018 ident: 2490_CR19 publication-title: Sci Rep doi: 10.1038/s41598-018-29325-6 – ident: 2490_CR39 doi: 10.5281/zenodo.5260009 – ident: 2490_CR40 – volume: 7 start-page: a023358 issue: 9 year: 2015 ident: 2490_CR20 publication-title: Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a023358 – ident: 2490_CR28 – volume: 43 start-page: D571 issue: Database issue year: 2015 ident: 2490_CR7 publication-title: Nucleic Acids Res doi: 10.1093/nar/gku1207 – start-page: 1 volume-title: COM ’00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching year: 2000 ident: 2490_CR3 doi: 10.1007/3-540-45123-4_1 – ident: 2490_CR26 – ident: 2490_CR24 – volume: 40 start-page: D130 issue: Database issue year: 2012 ident: 2490_CR32 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkr1079 – volume: 20 start-page: 257 issue: 1 year: 2019 ident: 2490_CR14 publication-title: Genome Biol doi: 10.1186/s13059-019-1891-0 – volume: 6 issue: 6 year: 2011 ident: 2490_CR21 publication-title: PLoS One. doi: 10.1371/journal.pone.0020660 – volume: 20 start-page: 232 issue: 1 year: 2019 ident: 2490_CR11 publication-title: Genome Biol doi: 10.1186/s13059-019-1841-x – volume: 3 start-page: 160018 year: 2016 ident: 2490_CR25 publication-title: Sci Data doi: 10.1038/sdata.2016.18 – ident: 2490_CR5 – volume: 40 start-page: D54 issue: Database issue year: 2012 ident: 2490_CR2 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkr854 – ident: 2490_CR31 – ident: 2490_CR34 doi: 10.1089/cmb.2006.13.1028 – ident: 2490_CR37 – volume: 27 start-page: 2531 issue: 10 year: 2020 ident: 2490_CR15 publication-title: Saudi J Biol Sci doi: 10.1016/j.sjbs.2020.04.033 – ident: 2490_CR33 – volume: 24 start-page: 1757 issue: 16 year: 2008 ident: 2490_CR38 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btn322 – volume: 9 start-page: 376 year: 2020 ident: 2490_CR23 publication-title: F1000Res doi: 10.12688/f1000research.23180.2 – ident: 2490_CR27 – volume: 20 start-page: 2421 issue: 15 year: 2004 ident: 2490_CR29 publication-title: Bioinformatics doi: 10.1093/bioinformatics/bth266 – volume: 38 start-page: D870 issue: Database issue year: 2010 ident: 2490_CR1 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkp1078 – volume: 8 start-page: 1006 year: 2019 ident: 2490_CR12 publication-title: F1000Res doi: 10.12688/f1000research.19675.1 – volume: 19 start-page: 198 year: 2018 ident: 2490_CR30 publication-title: Genome Biol doi: 10.1186/s13059-018-1568-0 – volume: 15 start-page: R46 year: 2014 ident: 2490_CR13 publication-title: Genome Biol doi: 10.1186/gb-2014-15-3-r46 – ident: 2490_CR41 – volume: 80 start-page: 578 issue: 5 year: 2020 ident: 2490_CR16 publication-title: J Infect doi: 10.1016/j.jinf.2020.02.020 – volume: 20 start-page: 1125 year: 2019 ident: 2490_CR10 publication-title: Brief Bioinform doi: 10.1093/bib/bbx120 – volume: 33 start-page: 2914 issue: 18 year: 2017 ident: 2490_CR22 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx334 – volume: 20 start-page: e4831 issue: 6 year: 2019 ident: 2490_CR17 publication-title: EMBO Rep doi: 10.15252/embr.201948316 – ident: 2490_CR4 doi: 10.1186/s13059-016-0997-x – volume: 15 start-page: 161 issue: 3 year: 2017 ident: 2490_CR8 publication-title: Nat Rev Microbiol doi: 10.1038/nrmicro.2016.177 – ident: 2490_CR9 doi: 10.1038/nrmicro.2017.13 – ident: 2490_CR6 – ident: 2490_CR35 doi: 10.17169/refubium-22374 – volume: 305 start-page: 183 issue: 5681 year: 2004 ident: 2490_CR18 publication-title: Science doi: 10.1126/science.1095019 |
| SSID | ssj0019426 ssj0017866 |
| Score | 2.559979 |
| Snippet | Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these... Abstract Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these... |
| SourceID | doaj pubmedcentral proquest pubmed crossref springer |
| SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 270 |
| SubjectTerms | Accuracy Animal Genetics and Genomics Archives & records Bioinformatics Biomedical and Life Sciences Biotechnology DNA Contamination Evolutionary Biology genome Genomes High-Throughput Nucleotide Sequencing - methods Human Genetics Humans Life Sciences Metadata Metagenomics Metagenomics - methods Method Microbial Genetics and Genomics MinHash National Center for Biotechnology Information Next-generation sequencing Plant Genetics and Genomics SARS-CoV-2 - genetics Software species diversity Taxonomy |
| SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZgCxIX3o9AQUbixlpNHMexuaAWteoBVhVdpN6ixHbaFSUpm10k_j0zjpNqeeyFS6QkThR7xuNvMuNvCHkDiEE5U1nGK5jkIq5qpoysma5KrnOjksR5yvyP-Wymzs70Sfjh1oW0ysEmekNtW4P_yPd4plKsPyKT91ffGVaNwuhqKKFxk-wgU5mYkJ2Dw9nJ5zGOkCtEK-FEC95vNsIUxEzLYQ-NknsdmPJMM8xXQA69mMUb65Sn8_8bBv0zlfK3eKpfpo7u_W8H75O7AaDS_V6jHpAbrnlIbvclK38-IuvT-f78HS1pXXarKe1AxLj5ako_LZrjsrtguCxa-pV9c0u6attLONDSR5bpaUjbppi5TwPnLW3Q9z737NeoJHRI7qYddHmBSbpN95h8OTqcfzhmoXQDM4AnVywpM2WROk_q2mQGkVmuuawADKW5kcICsLQ8AeyV1KmqtJPg1CuADrUzLpU2fUImTdu4Z4TKWOlMG5M5bkUpTZW7GN8hJdhpcEcjkgyCKkzgNcfyGpeF92-ULHrhFiDcwgu3iCPydnzmqmf12Nr6AOU_tkRGbn-hXZ4XYYIXqdXSuriWtqpFWeVlrmKHzESK1wDqXER2B7EXwUx0xbXMI_J6vA1ji1GbsnHtGtsAqMNM5GxLGwnvSLXkIiJPe4UcvzYVmRAih3HKN1R1ozubd5rFhScaVwK9BRmR6aDU15_-7-F6vr2nL8gd7ueZBgu9Syar5dq9JLfMj9WiW74KE_gXXZpIng priority: 102 providerName: ProQuest – databaseName: Springer LINK dbid: RSV link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELZQCxIXyrMECjISN9YiD8exeyuIqgeoELug3izHdtoVJak22Ur8-844D7RQKsElh2QSOeMZzzea8WdCXgNikN6WjqUlODmPy4pJKyqmSpOqwsok8YEy_2NxfCxPTtTnYVNYO3a7jyXJsFIHt5bibQurba4YthQgzV3MIFHfhnAn0R2_zL9NtQMFQWfcHnPtexshKDD1Xwcv_-yS_K1UGiLQ4c7_jf0-uTcgTnrQm8gDcsvXD8md_gzKn4_Ier44WOxTQyvTdjPawpzhbqoZ_bSsj0x7xjDOOfqd_fAr2jXNOVyoCaViOh_6sCm24tOBxJbWmEyfBjprnHU6dmvTFn50iV23dfuYfD38sHh_xIazGJgFgNixxOTSIReeUJXNLUKtQqWiBHSTFVZwB0jRpQmAqaTKZKm8gCxdAhaovPWZcNkTslU3tX9KqIilypW1uU8dN8KWhY_xG0LAwgv5ZUSScXq0HYjK8byMcx0SFil0r04N6tRBnTqOyJvpnYuepuNG6Xc465MkUmyHG83qVA8eqzOnhPNxJVxZcVMWppCxR6ohmVaA0nxE9kab0YPftzrNZYan5IgkIq-mx6BbLMOY2jdrlAGUhq3F-Q0yAr6RKZHyiOz2ZjiNNuM557wAPRUbBrrxO5tP6uVZYA6XHOG_iMhsNNNfQ_-7up79m_hzcjcNlq5gCd4jW91q7V-Q2_ayW7arl8FXrwBW2Diq priority: 102 providerName: Springer Nature |
| Title | STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions |
| URI | https://link.springer.com/article/10.1186/s13059-021-02490-0 https://www.ncbi.nlm.nih.gov/pubmed/34544477 https://www.proquest.com/docview/2583113661 https://www.proquest.com/docview/2575066765 https://www.proquest.com/docview/2636639624 https://pubmed.ncbi.nlm.nih.gov/PMC8450716 https://doaj.org/article/3d96de0f6dbf4ab7a780e060482f635e |
| Volume | 22 |
| WOSCitedRecordID | wos000700284000002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVADU databaseName: BioMedCentral Open Access database customDbUrl: eissn: 1474-760X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017866 issn: 1474-760X databaseCode: RBZ dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.biomedcentral.com/search/ providerName: BioMedCentral – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1474-760X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019426 issn: 1474-760X databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVPQU databaseName: AUTh Library subscriptions: ProQuest Central customDbUrl: eissn: 1474-760X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019426 issn: 1474-760X databaseCode: BENPR dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Biological Science Database customDbUrl: eissn: 1474-760X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019426 issn: 1474-760X databaseCode: M7P dateStart: 20150101 isFulltext: true titleUrlDefault: http://search.proquest.com/biologicalscijournals providerName: ProQuest – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1474-760X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019426 issn: 1474-760X databaseCode: 7X7 dateStart: 20150101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 1474-760X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019426 issn: 1474-760X databaseCode: PIMPY dateStart: 20150101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest – providerCode: PRVAVX databaseName: Springer LINK customDbUrl: eissn: 1474-760X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019426 issn: 1474-760X databaseCode: RSV dateStart: 20000201 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lj9MwELZgAYkL4k1gqYzEjVqbpx_ctmhXi8RW0bagwiVybIetWFLUpEj8e2acpFAey4WLpcRO5MyMPd_Ik28IeQ6IQTpTWhaXsMjTsKyYNLxiqtSxEkZGkfOU-W_EdCoXC5X_VOoLc8I6euBOcAeJVdy6sOK2rFJdCi1k6JDxRcYVOEuHuy-gniGY6s8PFDie4RcZyQ8a2KkzxTAdASnyQhbuuCHP1v8niPl7puQvx6XeCx3fJrd6-EgPu2nfIVdcfZfc6ApKfrtHNrP54fwl1bTSTTumDSgAf40a09NlfaKbc4ZOy9JP7LNb03a1uoCGan_uS2d9UjXFvHraM9LSGiPjj56bGlVIh9Rr2sCMl5hCWzf3ydvjo_mrE9YXVmAG0F7LIp1Ji8R2XFUmM4ibhIp5CVAlEYanFmCfjSNARlGVyFI5DiG3BMdeOeMSbpMHZK9e1e4RoTyUKlPGZC62qeamFC7Ed3AOuygEiwGJBjkXpmcdx-IXF4WPPiQvOt0UoJvC66YIA_Ji-8yXjnPj0tETVN92JPJl-xtgRUVvRcW_rCgg-4Pyi34RN0WcyQRL3vAoIM-23SBbPFPRtVttcAxALswTzi4Zw-EdieJxGpCHnT1tZ5ukWZqmAuQkdixt53N2e-rluacBlylieR6Q8WCTP6b-d3E9_h_iekJuxn4xKdhl98leu964p-S6-doum_WIXBUL4Vs5ItcmR9P8bOSX5wgza3O4l78-zd_D1dnkA7azd98BuhU9Uw |
| linkProvider | Directory of Open Access Journals |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Zb9NAEF6VAoIX7sNQYJHgiaxqr-09kBAqR5WqaYTUIOXNtdfrNmprlzgB9U_xG5lZ26nCkbc-8BIp8WblXc_xjXfmG0JeAWJQ1mQ54xkoeeRnBVNGFExnKdfSqCCwjjJ_IIdDNR7rL2vkZ1cLg2mVnU10hjqvDL4j3-SxCrH_iAjen31j2DUKT1e7FhqNWOza8x8QstXvdj7B833N-fbn0cc-a7sKMANQZ8aCNFY5sroJXZjYIGiQmosM_HQojYhywDw5DwAWBEWoMm0FxJsKvFphjQ1FHsK8V8hVsOMSU8jkeBHgBVIhNmq_6Ig3pU2Y8Bhr0VXsKLFZg-OINcPsCGTs85m_5BVd84C_Id4_Ezd_O711TnH79v-2nXfIrRZ-061GX-6SNVveI9ebhpzn98l8f7Q1ektTWqT1rEdrEGAsLevRvUnZT-sjhk4_p8fs1E7prKpO4IOm7tyc7rdJ6RTrEmjL6EtLfLNw6Li9UQVol7pOa9jiCaYgl_UD8vVS1vyQrJdVaR8TKnylY21MbHkepcJk0vo4hxDghSDY9kjQCUZiWtZ2bB5ykrjoTYmkEaYEhClxwpT4Hnmz-M9Zw1mycvQHlLfFSOQbdz9U08OkNV9JmGuRW78QeVZEaSZTqXyLvEuKFwBZrUc2OjFLWiNYJxcy5pGXi8uwt3gmlZa2muMYgKyYZx2vGCNgjlALHnnkUaMAi7sNoziKIgn7JJdUY2k5y1fKyZGjUVcRxkLCI71OiS5u_d_b9WT1Sl-QG_3R3iAZ7Ax3n5Kb3Om4Bl-0QdZn07l9Rq6Z77NJPX3uTAclB5etXL8AOdKhbg |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZQeYgLb2iggJG4sVHzcBybW3msiiirSrug3izHdtoVbVJtskj8e2acByyUSohLDsnYisdj-xvN-BtCXgJiEM4UNkwKWOQsKspQGF6GstCJzI2IY-cp8w_y2UwcHcnDX27x-2z3ISTZ3WlAlqaq3T23ZbfEBd9tYOfNZIjpBUh5F4XgtF9lWDQI_fX5lzGOIOEAGq7KXNhu4zjyrP0XQc0_MyZ_C5v602h6-__HcYfc6pEo3etM5y654qp75HpXm_L7fbKeL_YWr6mmpW7aCW1gLvGW1YR-Wlb7ujkJ8fyz9Gt45la0retTeFDtQ8h03udnU0zRpz25La3QyT72NNdoDXTI4qYNDHqJ2bhV84B8nr5fvN0P-xoNoQHg2IaxzoRFjjwuS5MZhGC5THgBqCfNDWcWEKRNYgBZcZmKQjoO3rsAjFA641Ju04dkq6ort00oj4TMpDGZSyzT3BS5i7APzmFDBr8zIPEwVcr0BOZYR-NUeUdGcNWpU4E6lVenigLyamxz3tF3XCr9Bi1glETqbf-iXh2rfiWr1EpuXVRyW5RMF7nOReSQgkgkJaA3F5CdwX5Uvx80KslEitVzeByQF-Nn0C2GZ3Tl6jXKAHrDlOPsEhkOfaSSJywgjzqTHP82ZRljLAc95RvGujGczS_V8sQziguGbgEPyGQw2Z-__nd1Pf438efkxuG7qTr4MPv4hNxMvNFL2KV3yFa7Wrun5Jr51i6b1TO_hH8A8qFEcg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=STAT%3A+a+fast%2C+scalable%2C+MinHash-based+k-mer+tool+to+assess+Sequence+Read+Archive+next-generation+sequence+submissions&rft.jtitle=Genome+biology&rft.au=Kenneth+S.+Katz&rft.au=Oleg+Shutov&rft.au=Richard+Lapoint&rft.au=Michael+Kimelman&rft.date=2021-09-20&rft.pub=BMC&rft.eissn=1474-760X&rft.volume=22&rft.issue=1&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1186%2Fs13059-021-02490-0&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_3d96de0f6dbf4ab7a780e060482f635e |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1474-760X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1474-760X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1474-760X&client=summon |