STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions

Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k- mer-based tool for fast assessment of taxonomic diversity intrinsic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genome Biology Jg. 22; H. 1; S. 270
Hauptverfasser: Katz, Kenneth S., Shutov, Oleg, Lapoint, Richard, Kimelman, Michael, Brister, J. Rodney, O’Sullivan, Christopher
Format: Journal Article
Sprache:Englisch
Veröffentlicht: London BioMed Central 20.09.2021
Springer Nature B.V
BMC
Schlagworte:
ISSN:1474-760X, 1474-7596, 1474-760X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k- mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k- mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.
AbstractList Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.
Abstract Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.
Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k- mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k- mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.
Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.
ArticleNumber 270
Author Katz, Kenneth S.
Kimelman, Michael
Brister, J. Rodney
Lapoint, Richard
Shutov, Oleg
O’Sullivan, Christopher
Author_xml – sequence: 1
  givenname: Kenneth S.
  orcidid: 0000-0002-9134-4559
  surname: Katz
  fullname: Katz, Kenneth S.
  email: kskatz@nih.gov
  organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
– sequence: 2
  givenname: Oleg
  surname: Shutov
  fullname: Shutov, Oleg
  organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
– sequence: 3
  givenname: Richard
  surname: Lapoint
  fullname: Lapoint, Richard
  organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
– sequence: 4
  givenname: Michael
  surname: Kimelman
  fullname: Kimelman, Michael
  organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
– sequence: 5
  givenname: J. Rodney
  surname: Brister
  fullname: Brister, J. Rodney
  organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
– sequence: 6
  givenname: Christopher
  surname: O’Sullivan
  fullname: O’Sullivan, Christopher
  organization: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34544477$$D View this record in MEDLINE/PubMed
BookMark eNqFkktv1DAUhSNURB_wB1ggS2xYNGDHz7BAGlVAKxUh0UFiZzn2zUyG1G59MxX8e8xMW9ouyiKOZX_n-Pr67Fc7MUWoqpeMvmXMqHfIOJVtTRtWPtHSmj6p9pjQotaK_ti5M9-t9hFXlLJWNOpZtcuFFEJovVetz-az-XviSO9wOiTo3ei6EQ7JlyEeO1zWnUMI5Gd9DplMKY1lIA4REMkZXK4heiDfwAUyy345XAGJ8GuqFxAhu2lIkeANhevufEAsa_i8etq7EeHF9f-g-v7p4_zouD79-vnkaHZae63FVDMnTTBKM9X2XvpSv9Rto7qWa669EoG1TWiYajjruelaUNBxYyTvwQNXgR9UJ1vfkNzKXuTh3OXfNrnBbhZSXliXp8GPYHloVQDaq9D1wnXaaUOBKipM0ysuoXh92HpdlHtA8BCn7MZ7pvd34rC0i3RljZC0XKEYvLk2yKm0BCdb2uFhHF2EtEbbKK4Ub1Uj_o9KLalSWsmCvn6ArtI6x9LVQhnOWDFlhXp1t_jbqm-CUIBmC_icEDP0twij9m_a7DZttqTNbtJmaRGZByI_TJtXLw0YxselfCvFck5cQP5X9iOqPwwW5-U
CitedBy_id crossref_primary_10_7717_peerj_13410
crossref_primary_10_1093_gpbjnl_qzaf072
crossref_primary_10_1093_ismeco_ycae024
crossref_primary_10_3389_fcimb_2021_759697
crossref_primary_10_1038_s41467_023_43960_2
crossref_primary_10_1038_s41586_021_04332_2
crossref_primary_10_1038_s41467_024_52598_7
crossref_primary_10_1016_j_dib_2024_110073
crossref_primary_10_1093_nar_gkae979
crossref_primary_10_1007_s11046_025_00931_z
crossref_primary_10_1038_s41467_023_41174_0
crossref_primary_10_1093_ve_veae022
crossref_primary_10_1128_mra_00286_22
crossref_primary_10_7717_peerj_14055
crossref_primary_10_1094_PDIS_06_24_1265_SC
crossref_primary_10_1093_nar_gkab1112
crossref_primary_10_1128_mra_00245_24
crossref_primary_10_3389_finsc_2023_1093970
crossref_primary_10_1093_biolinnean_blae028
crossref_primary_10_1142_S2737416525500176
crossref_primary_10_1128_aem_00913_25
crossref_primary_10_1038_s41467_024_47187_7
crossref_primary_10_3390_microorganisms11082096
crossref_primary_10_1016_j_envres_2022_115065
crossref_primary_10_3390_microorganisms11102612
crossref_primary_10_1002_edn3_489
crossref_primary_10_1093_femsre_fuad051
crossref_primary_10_1016_j_cell_2024_12_017
crossref_primary_10_3389_fmars_2023_1159754
crossref_primary_10_1099_mgen_0_001051
crossref_primary_10_1093_gigascience_giae010
crossref_primary_10_1128_msystems_00840_25
crossref_primary_10_1128_spectrum_03426_22
crossref_primary_10_3390_v16030430
crossref_primary_10_1093_nar_gkac298
crossref_primary_10_1186_s13059_023_03141_2
crossref_primary_10_3390_v16060856
crossref_primary_10_3390_v14091859
crossref_primary_10_1093_bib_bbad280
crossref_primary_10_7717_peerj_13821
crossref_primary_10_1128_mbio_01142_25
crossref_primary_10_3390_microorganisms11030790
crossref_primary_10_1093_ve_veae040
crossref_primary_10_1111_1348_0421_13033
crossref_primary_10_1093_nar_gkab1053
crossref_primary_10_1093_nar_gkad1044
crossref_primary_10_1038_s41592_024_02280_z
crossref_primary_10_1128_mra_00253_23
crossref_primary_10_1093_nar_gkae364
Cites_doi 10.1128/mSphere.00160-20
10.1038/s41598-018-29325-6
10.5281/zenodo.5260009
10.1101/cshperspect.a023358
10.1093/nar/gku1207
10.1007/3-540-45123-4_1
10.1093/nar/gkr1079
10.1186/s13059-019-1891-0
10.1371/journal.pone.0020660
10.1186/s13059-019-1841-x
10.1038/sdata.2016.18
10.1093/nar/gkr854
10.1089/cmb.2006.13.1028
10.1016/j.sjbs.2020.04.033
10.1093/bioinformatics/btn322
10.12688/f1000research.23180.2
10.1093/bioinformatics/bth266
10.1093/nar/gkp1078
10.12688/f1000research.19675.1
10.1186/s13059-018-1568-0
10.1186/gb-2014-15-3-r46
10.1016/j.jinf.2020.02.020
10.1093/bib/bbx120
10.1093/bioinformatics/btx334
10.15252/embr.201948316
10.1186/s13059-016-0997-x
10.1038/nrmicro.2016.177
10.1038/nrmicro.2017.13
10.17169/refubium-22374
10.1126/science.1095019
ContentType Journal Article
Copyright The Author(s) 2021
2021. The Author(s).
2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2021
– notice: 2021. The Author(s).
– notice: 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7X7
7XB
88E
8FE
8FH
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
COVID
DWQXO
FYUFA
GHDGH
GNUQQ
HCIFZ
K9.
LK8
M0S
M1P
M7P
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
7X8
7S9
L.6
5PM
DOA
DOI 10.1186/s13059-021-02490-0
DatabaseName Springer Nature OA Free Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
ProQuest Central (Corporate)
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Hospital Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Database
ProQuest Central
Natural Science Collection
ProQuest One Community College
Coronavirus Research Database
ProQuest Central
Proquest Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Biological Sciences
ProQuest Health & Medical Collection
Medical Database
Biological Science Database
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
MEDLINE - Academic
AGRICOLA
AGRICOLA - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Health & Medical Research Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
Natural Science Collection
ProQuest Central Korea
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Coronavirus Research Database
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest SciTech Collection
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest Medical Library
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
AGRICOLA
AGRICOLA - Academic
DatabaseTitleList Publicly Available Content Database
AGRICOLA



MEDLINE - Academic
CrossRef
MEDLINE
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1474-760X
EndPage 270
ExternalDocumentID oai_doaj_org_article_3d96de0f6dbf4ab7a780e060482f635e
PMC8450716
34544477
10_1186_s13059_021_02490_0
Genre Journal Article
Research Support, N.I.H., Intramural
GrantInformation_xml – fundername: National Library of Medicine (NLM)
– fundername: ;
GroupedDBID ---
0R~
29H
4.4
53G
5GY
5VS
7X7
88E
8FE
8FH
8FI
8FJ
AAFWJ
AAHBH
AAJSJ
AASML
ABUWG
ACGFO
ACGFS
ACJQM
ACPRK
ADBBV
ADUKV
AEGXH
AFKRA
AFPKN
AHBYD
AIAGR
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIAM
AOIJS
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BHPHI
BMC
BPHCQ
BVXVI
C6C
CCPQU
EBD
EBLON
EBS
EMOBN
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HMCUK
IAO
IGS
IHR
ISR
ITC
KPI
LK8
M1P
M7P
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PUEGO
ROL
RPM
RSV
SJN
SOJ
SV3
UKHRP
AAYXX
AFFHD
CITATION
3V.
ACRMQ
ADINQ
ALIPV
C24
CGR
CUY
CVF
ECM
EIF
NPM
7XB
8FK
AZQEC
COVID
DWQXO
GNUQQ
K9.
PKEHL
PQEST
PQUKI
PRINS
7X8
7S9
L.6
5PM
ID FETCH-LOGICAL-c774t-1a58d867169fc5c00157926b93737c64d192d216231f38b9e6eb38853fece36d3
IEDL.DBID DOA
ISICitedReferencesCount 53
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000700284000002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1474-760X
1474-7596
IngestDate Fri Oct 03 12:51:57 EDT 2025
Tue Nov 04 01:47:08 EST 2025
Sun Nov 09 14:02:16 EST 2025
Fri Sep 05 12:01:20 EDT 2025
Tue Oct 14 14:11:26 EDT 2025
Thu Jan 02 22:39:54 EST 2025
Tue Nov 18 21:30:03 EST 2025
Sat Nov 29 04:55:57 EST 2025
Sat Sep 06 07:17:33 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords MinHash
Metagenomics
Language English
License 2021. The Author(s).
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c774t-1a58d867169fc5c00157926b93737c64d192d216231f38b9e6eb38853fece36d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-9134-4559
OpenAccessLink https://doaj.org/article/3d96de0f6dbf4ab7a780e060482f635e
PMID 34544477
PQID 2583113661
PQPubID 2040232
PageCount 1
ParticipantIDs doaj_primary_oai_doaj_org_article_3d96de0f6dbf4ab7a780e060482f635e
pubmedcentral_primary_oai_pubmedcentral_nih_gov_8450716
proquest_miscellaneous_2636639624
proquest_miscellaneous_2575066765
proquest_journals_2583113661
pubmed_primary_34544477
crossref_primary_10_1186_s13059_021_02490_0
crossref_citationtrail_10_1186_s13059_021_02490_0
springer_journals_10_1186_s13059_021_02490_0
PublicationCentury 2000
PublicationDate 2021-09-20
PublicationDateYYYYMMDD 2021-09-20
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-09-20
  day: 20
PublicationDecade 2020
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle Genome Biology
PublicationTitleAbbrev Genome Biol
PublicationTitleAlternate Genome Biol
PublicationYear 2021
Publisher BioMed Central
Springer Nature B.V
BMC
Publisher_xml – name: BioMed Central
– name: Springer Nature B.V
– name: BMC
References F Pfeiffer (2490_CR19) 2018; 8
BD Ondov (2490_CR11) 2019; 20
AA Al-Qahtani (2490_CR15) 2020; 27
A Chakravarti (2490_CR20) 2015; 7
JC Castle (2490_CR21) 2011; 6
A Morgulis (2490_CR38) 2008; 24
MN Bernstein (2490_CR22) 2017; 33
2490_CR6
2490_CR31
2490_CR5
Z Lin (2490_CR18) 2004; 305
2490_CR33
2490_CR35
2490_CR9
2490_CR34
MD Wilkinson (2490_CR25) 2016; 3
2490_CR37
2490_CR39
M Shumway (2490_CR1) 2010; 38
NT Pierce (2490_CR12) 2019; 8
FP Breitwieser (2490_CR10) 2019; 20
KD Pruitt (2490_CR32) 2012; 40
JR Brister (2490_CR7) 2015; 43
2490_CR4
Y Fofanov (2490_CR29) 2004; 20
DE Wood (2490_CR13) 2014; 15
MN Bernstein (2490_CR23) 2020; 9
DE Wood (2490_CR14) 2019; 20
2490_CR40
Y Kodama (2490_CR2) 2012; 40
2490_CR41
FP Breitwieser (2490_CR30) 2018; 19
P Simmonds (2490_CR8) 2017; 15
2490_CR24
L Wahba (2490_CR36) 2020; 5
AZ Broder (2490_CR3) 2000
2490_CR26
PJ Lillie (2490_CR16) 2020; 80
M Shabani (2490_CR17) 2019; 20
2490_CR28
2490_CR27
References_xml – volume: 5
  start-page: e00160
  issue: 3
  year: 2020
  ident: 2490_CR36
  publication-title: mSphere
  doi: 10.1128/mSphere.00160-20
– volume: 8
  start-page: 10950
  issue: 1
  year: 2018
  ident: 2490_CR19
  publication-title: Sci Rep
  doi: 10.1038/s41598-018-29325-6
– ident: 2490_CR39
  doi: 10.5281/zenodo.5260009
– ident: 2490_CR40
– volume: 7
  start-page: a023358
  issue: 9
  year: 2015
  ident: 2490_CR20
  publication-title: Cold Spring Harb Perspect Biol
  doi: 10.1101/cshperspect.a023358
– ident: 2490_CR28
– volume: 43
  start-page: D571
  issue: Database issue
  year: 2015
  ident: 2490_CR7
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gku1207
– start-page: 1
  volume-title: COM ’00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
  year: 2000
  ident: 2490_CR3
  doi: 10.1007/3-540-45123-4_1
– ident: 2490_CR26
– ident: 2490_CR24
– volume: 40
  start-page: D130
  issue: Database issue
  year: 2012
  ident: 2490_CR32
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkr1079
– volume: 20
  start-page: 257
  issue: 1
  year: 2019
  ident: 2490_CR14
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1891-0
– volume: 6
  issue: 6
  year: 2011
  ident: 2490_CR21
  publication-title: PLoS One.
  doi: 10.1371/journal.pone.0020660
– volume: 20
  start-page: 232
  issue: 1
  year: 2019
  ident: 2490_CR11
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1841-x
– volume: 3
  start-page: 160018
  year: 2016
  ident: 2490_CR25
  publication-title: Sci Data
  doi: 10.1038/sdata.2016.18
– ident: 2490_CR5
– volume: 40
  start-page: D54
  issue: Database issue
  year: 2012
  ident: 2490_CR2
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkr854
– ident: 2490_CR31
– ident: 2490_CR34
  doi: 10.1089/cmb.2006.13.1028
– ident: 2490_CR37
– volume: 27
  start-page: 2531
  issue: 10
  year: 2020
  ident: 2490_CR15
  publication-title: Saudi J Biol Sci
  doi: 10.1016/j.sjbs.2020.04.033
– ident: 2490_CR33
– volume: 24
  start-page: 1757
  issue: 16
  year: 2008
  ident: 2490_CR38
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btn322
– volume: 9
  start-page: 376
  year: 2020
  ident: 2490_CR23
  publication-title: F1000Res
  doi: 10.12688/f1000research.23180.2
– ident: 2490_CR27
– volume: 20
  start-page: 2421
  issue: 15
  year: 2004
  ident: 2490_CR29
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth266
– volume: 38
  start-page: D870
  issue: Database issue
  year: 2010
  ident: 2490_CR1
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkp1078
– volume: 8
  start-page: 1006
  year: 2019
  ident: 2490_CR12
  publication-title: F1000Res
  doi: 10.12688/f1000research.19675.1
– volume: 19
  start-page: 198
  year: 2018
  ident: 2490_CR30
  publication-title: Genome Biol
  doi: 10.1186/s13059-018-1568-0
– volume: 15
  start-page: R46
  year: 2014
  ident: 2490_CR13
  publication-title: Genome Biol
  doi: 10.1186/gb-2014-15-3-r46
– ident: 2490_CR41
– volume: 80
  start-page: 578
  issue: 5
  year: 2020
  ident: 2490_CR16
  publication-title: J Infect
  doi: 10.1016/j.jinf.2020.02.020
– volume: 20
  start-page: 1125
  year: 2019
  ident: 2490_CR10
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbx120
– volume: 33
  start-page: 2914
  issue: 18
  year: 2017
  ident: 2490_CR22
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx334
– volume: 20
  start-page: e4831
  issue: 6
  year: 2019
  ident: 2490_CR17
  publication-title: EMBO Rep
  doi: 10.15252/embr.201948316
– ident: 2490_CR4
  doi: 10.1186/s13059-016-0997-x
– volume: 15
  start-page: 161
  issue: 3
  year: 2017
  ident: 2490_CR8
  publication-title: Nat Rev Microbiol
  doi: 10.1038/nrmicro.2016.177
– ident: 2490_CR9
  doi: 10.1038/nrmicro.2017.13
– ident: 2490_CR6
– ident: 2490_CR35
  doi: 10.17169/refubium-22374
– volume: 305
  start-page: 183
  issue: 5681
  year: 2004
  ident: 2490_CR18
  publication-title: Science
  doi: 10.1126/science.1095019
SSID ssj0019426
ssj0017866
Score 2.559979
Snippet Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these...
Abstract Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
springer
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 270
SubjectTerms Accuracy
Animal Genetics and Genomics
Archives & records
Bioinformatics
Biomedical and Life Sciences
Biotechnology
DNA Contamination
Evolutionary Biology
genome
Genomes
High-Throughput Nucleotide Sequencing - methods
Human Genetics
Humans
Life Sciences
Metadata
Metagenomics
Metagenomics - methods
Method
Microbial Genetics and Genomics
MinHash
National Center for Biotechnology Information
Next-generation sequencing
Plant Genetics and Genomics
SARS-CoV-2 - genetics
Software
species diversity
Taxonomy
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZgCxIX3o9AQUbixlpNHMexuaAWteoBVhVdpN6ixHbaFSUpm10k_j0zjpNqeeyFS6QkThR7xuNvMuNvCHkDiEE5U1nGK5jkIq5qpoysma5KrnOjksR5yvyP-Wymzs70Sfjh1oW0ysEmekNtW4P_yPd4plKsPyKT91ffGVaNwuhqKKFxk-wgU5mYkJ2Dw9nJ5zGOkCtEK-FEC95vNsIUxEzLYQ-NknsdmPJMM8xXQA69mMUb65Sn8_8bBv0zlfK3eKpfpo7u_W8H75O7AaDS_V6jHpAbrnlIbvclK38-IuvT-f78HS1pXXarKe1AxLj5ako_LZrjsrtguCxa-pV9c0u6attLONDSR5bpaUjbppi5TwPnLW3Q9z737NeoJHRI7qYddHmBSbpN95h8OTqcfzhmoXQDM4AnVywpM2WROk_q2mQGkVmuuawADKW5kcICsLQ8AeyV1KmqtJPg1CuADrUzLpU2fUImTdu4Z4TKWOlMG5M5bkUpTZW7GN8hJdhpcEcjkgyCKkzgNcfyGpeF92-ULHrhFiDcwgu3iCPydnzmqmf12Nr6AOU_tkRGbn-hXZ4XYYIXqdXSuriWtqpFWeVlrmKHzESK1wDqXER2B7EXwUx0xbXMI_J6vA1ji1GbsnHtGtsAqMNM5GxLGwnvSLXkIiJPe4UcvzYVmRAih3HKN1R1ozubd5rFhScaVwK9BRmR6aDU15_-7-F6vr2nL8gd7ueZBgu9Syar5dq9JLfMj9WiW74KE_gXXZpIng
  priority: 102
  providerName: ProQuest
– databaseName: Springer LINK
  dbid: RSV
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELZQCxIXyrMECjISN9YiD8exeyuIqgeoELug3izHdtoVJak22Ur8-844D7RQKsElh2QSOeMZzzea8WdCXgNikN6WjqUlODmPy4pJKyqmSpOqwsok8YEy_2NxfCxPTtTnYVNYO3a7jyXJsFIHt5bibQurba4YthQgzV3MIFHfhnAn0R2_zL9NtQMFQWfcHnPtexshKDD1Xwcv_-yS_K1UGiLQ4c7_jf0-uTcgTnrQm8gDcsvXD8md_gzKn4_Ier44WOxTQyvTdjPawpzhbqoZ_bSsj0x7xjDOOfqd_fAr2jXNOVyoCaViOh_6sCm24tOBxJbWmEyfBjprnHU6dmvTFn50iV23dfuYfD38sHh_xIazGJgFgNixxOTSIReeUJXNLUKtQqWiBHSTFVZwB0jRpQmAqaTKZKm8gCxdAhaovPWZcNkTslU3tX9KqIilypW1uU8dN8KWhY_xG0LAwgv5ZUSScXq0HYjK8byMcx0SFil0r04N6tRBnTqOyJvpnYuepuNG6Xc465MkUmyHG83qVA8eqzOnhPNxJVxZcVMWppCxR6ohmVaA0nxE9kab0YPftzrNZYan5IgkIq-mx6BbLMOY2jdrlAGUhq3F-Q0yAr6RKZHyiOz2ZjiNNuM557wAPRUbBrrxO5tP6uVZYA6XHOG_iMhsNNNfQ_-7up79m_hzcjcNlq5gCd4jW91q7V-Q2_ayW7arl8FXrwBW2Diq
  priority: 102
  providerName: Springer Nature
Title STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions
URI https://link.springer.com/article/10.1186/s13059-021-02490-0
https://www.ncbi.nlm.nih.gov/pubmed/34544477
https://www.proquest.com/docview/2583113661
https://www.proquest.com/docview/2575066765
https://www.proquest.com/docview/2636639624
https://pubmed.ncbi.nlm.nih.gov/PMC8450716
https://doaj.org/article/3d96de0f6dbf4ab7a780e060482f635e
Volume 22
WOSCitedRecordID wos000700284000002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVADU
  databaseName: BioMedCentral Open Access database
  customDbUrl:
  eissn: 1474-760X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017866
  issn: 1474-760X
  databaseCode: RBZ
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.biomedcentral.com/search/
  providerName: BioMedCentral
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1474-760X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019426
  issn: 1474-760X
  databaseCode: DOA
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVPQU
  databaseName: AUTh Library subscriptions: ProQuest Central
  customDbUrl:
  eissn: 1474-760X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019426
  issn: 1474-760X
  databaseCode: BENPR
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Biological Science Database
  customDbUrl:
  eissn: 1474-760X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019426
  issn: 1474-760X
  databaseCode: M7P
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/biologicalscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1474-760X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019426
  issn: 1474-760X
  databaseCode: 7X7
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 1474-760X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019426
  issn: 1474-760X
  databaseCode: PIMPY
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: Springer LINK
  customDbUrl:
  eissn: 1474-760X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019426
  issn: 1474-760X
  databaseCode: RSV
  dateStart: 20000201
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lj9MwELZgAYkL4k1gqYzEjVqbpx_ctmhXi8RW0bagwiVybIetWFLUpEj8e2acpFAey4WLpcRO5MyMPd_Ik28IeQ6IQTpTWhaXsMjTsKyYNLxiqtSxEkZGkfOU-W_EdCoXC5X_VOoLc8I6euBOcAeJVdy6sOK2rFJdCi1k6JDxRcYVOEuHuy-gniGY6s8PFDie4RcZyQ8a2KkzxTAdASnyQhbuuCHP1v8niPl7puQvx6XeCx3fJrd6-EgPu2nfIVdcfZfc6ApKfrtHNrP54fwl1bTSTTumDSgAf40a09NlfaKbc4ZOy9JP7LNb03a1uoCGan_uS2d9UjXFvHraM9LSGiPjj56bGlVIh9Rr2sCMl5hCWzf3ydvjo_mrE9YXVmAG0F7LIp1Ji8R2XFUmM4ibhIp5CVAlEYanFmCfjSNARlGVyFI5DiG3BMdeOeMSbpMHZK9e1e4RoTyUKlPGZC62qeamFC7Ed3AOuygEiwGJBjkXpmcdx-IXF4WPPiQvOt0UoJvC66YIA_Ji-8yXjnPj0tETVN92JPJl-xtgRUVvRcW_rCgg-4Pyi34RN0WcyQRL3vAoIM-23SBbPFPRtVttcAxALswTzi4Zw-EdieJxGpCHnT1tZ5ukWZqmAuQkdixt53N2e-rluacBlylieR6Q8WCTP6b-d3E9_h_iekJuxn4xKdhl98leu964p-S6-doum_WIXBUL4Vs5ItcmR9P8bOSX5wgza3O4l78-zd_D1dnkA7azd98BuhU9Uw
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Zb9NAEF6VAoIX7sNQYJHgiaxqr-09kBAqR5WqaYTUIOXNtdfrNmprlzgB9U_xG5lZ26nCkbc-8BIp8WblXc_xjXfmG0JeAWJQ1mQ54xkoeeRnBVNGFExnKdfSqCCwjjJ_IIdDNR7rL2vkZ1cLg2mVnU10hjqvDL4j3-SxCrH_iAjen31j2DUKT1e7FhqNWOza8x8QstXvdj7B833N-fbn0cc-a7sKMANQZ8aCNFY5sroJXZjYIGiQmosM_HQojYhywDw5DwAWBEWoMm0FxJsKvFphjQ1FHsK8V8hVsOMSU8jkeBHgBVIhNmq_6Ig3pU2Y8Bhr0VXsKLFZg-OINcPsCGTs85m_5BVd84C_Id4_Ezd_O711TnH79v-2nXfIrRZ-061GX-6SNVveI9ebhpzn98l8f7Q1ektTWqT1rEdrEGAsLevRvUnZT-sjhk4_p8fs1E7prKpO4IOm7tyc7rdJ6RTrEmjL6EtLfLNw6Li9UQVol7pOa9jiCaYgl_UD8vVS1vyQrJdVaR8TKnylY21MbHkepcJk0vo4hxDghSDY9kjQCUZiWtZ2bB5ykrjoTYmkEaYEhClxwpT4Hnmz-M9Zw1mycvQHlLfFSOQbdz9U08OkNV9JmGuRW78QeVZEaSZTqXyLvEuKFwBZrUc2OjFLWiNYJxcy5pGXi8uwt3gmlZa2muMYgKyYZx2vGCNgjlALHnnkUaMAi7sNoziKIgn7JJdUY2k5y1fKyZGjUVcRxkLCI71OiS5u_d_b9WT1Sl-QG_3R3iAZ7Ax3n5Kb3Om4Bl-0QdZn07l9Rq6Z77NJPX3uTAclB5etXL8AOdKhbg
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZQeYgLb2iggJG4sVHzcBybW3msiiirSrug3izHdtoVbVJtskj8e2acByyUSohLDsnYisdj-xvN-BtCXgJiEM4UNkwKWOQsKspQGF6GstCJzI2IY-cp8w_y2UwcHcnDX27x-2z3ISTZ3WlAlqaq3T23ZbfEBd9tYOfNZIjpBUh5F4XgtF9lWDQI_fX5lzGOIOEAGq7KXNhu4zjyrP0XQc0_MyZ_C5v602h6-__HcYfc6pEo3etM5y654qp75HpXm_L7fbKeL_YWr6mmpW7aCW1gLvGW1YR-Wlb7ujkJ8fyz9Gt45la0retTeFDtQ8h03udnU0zRpz25La3QyT72NNdoDXTI4qYNDHqJ2bhV84B8nr5fvN0P-xoNoQHg2IaxzoRFjjwuS5MZhGC5THgBqCfNDWcWEKRNYgBZcZmKQjoO3rsAjFA641Ju04dkq6ort00oj4TMpDGZSyzT3BS5i7APzmFDBr8zIPEwVcr0BOZYR-NUeUdGcNWpU4E6lVenigLyamxz3tF3XCr9Bi1glETqbf-iXh2rfiWr1EpuXVRyW5RMF7nOReSQgkgkJaA3F5CdwX5Uvx80KslEitVzeByQF-Nn0C2GZ3Tl6jXKAHrDlOPsEhkOfaSSJywgjzqTHP82ZRljLAc95RvGujGczS_V8sQziguGbgEPyGQw2Z-__nd1Pf438efkxuG7qTr4MPv4hNxMvNFL2KV3yFa7Wrun5Jr51i6b1TO_hH8A8qFEcg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=STAT%3A+a+fast%2C+scalable%2C+MinHash-based+k-mer+tool+to+assess+Sequence+Read+Archive+next-generation+sequence+submissions&rft.jtitle=Genome+biology&rft.au=Kenneth+S.+Katz&rft.au=Oleg+Shutov&rft.au=Richard+Lapoint&rft.au=Michael+Kimelman&rft.date=2021-09-20&rft.pub=BMC&rft.eissn=1474-760X&rft.volume=22&rft.issue=1&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1186%2Fs13059-021-02490-0&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_3d96de0f6dbf4ab7a780e060482f635e
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1474-760X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1474-760X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1474-760X&client=summon