To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research Jg. 48; H. 10; S. 5217 - 5234
Hauptverfasser: Elworth, R A Leo, Wang, Qi, Kota, Pavan K, Barberan, C J, Coleman, Benjamin, Balaji, Advait, Gupta, Gaurav, Baraniuk, Richard G, Shrivastava, Anshumali, Treangen, Todd J
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Oxford University Press 04.06.2020
Schlagworte:
ISSN:0305-1048, 1362-4962, 1362-4962
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
AbstractList As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
Author Kota, Pavan K
Gupta, Gaurav
Coleman, Benjamin
Balaji, Advait
Wang, Qi
Shrivastava, Anshumali
Treangen, Todd J
Baraniuk, Richard G
Elworth, R A Leo
Barberan, C J
AuthorAffiliation 1 Department of Computer Science , Houston, TX 77005, USA
2 Systems, Synthetic, and Physical Biology (SSPB) Graduate Program , Houston, TX 77005, USA
4 Department of Electrical and Computer Engineering, Rice University , Houston, TX 77005, USA
3 Department of Bioengineering , Houston, TX 77005, USA
AuthorAffiliation_xml – name: 4 Department of Electrical and Computer Engineering, Rice University , Houston, TX 77005, USA
– name: 1 Department of Computer Science , Houston, TX 77005, USA
– name: 3 Department of Bioengineering , Houston, TX 77005, USA
– name: 2 Systems, Synthetic, and Physical Biology (SSPB) Graduate Program , Houston, TX 77005, USA
Author_xml – sequence: 1
  givenname: R A Leo
  surname: Elworth
  fullname: Elworth, R A Leo
  organization: Department of Computer Science, Houston, TX 77005, USA
– sequence: 2
  givenname: Qi
  surname: Wang
  fullname: Wang, Qi
  organization: Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
– sequence: 3
  givenname: Pavan K
  surname: Kota
  fullname: Kota, Pavan K
  organization: Department of Bioengineering, Houston, TX 77005, USA
– sequence: 4
  givenname: C J
  surname: Barberan
  fullname: Barberan, C J
  organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
– sequence: 5
  givenname: Benjamin
  surname: Coleman
  fullname: Coleman, Benjamin
  organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
– sequence: 6
  givenname: Advait
  surname: Balaji
  fullname: Balaji, Advait
  organization: Department of Computer Science, Houston, TX 77005, USA
– sequence: 7
  givenname: Gaurav
  surname: Gupta
  fullname: Gupta, Gaurav
  organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
– sequence: 8
  givenname: Richard G
  surname: Baraniuk
  fullname: Baraniuk, Richard G
  organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
– sequence: 9
  givenname: Anshumali
  surname: Shrivastava
  fullname: Shrivastava, Anshumali
  organization: Department of Computer Science, Houston, TX 77005, USA, Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
– sequence: 10
  givenname: Todd J
  orcidid: 0000-0002-3760-564X
  surname: Treangen
  fullname: Treangen, Todd J
  organization: Department of Computer Science, Houston, TX 77005, USA, Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32338745$$D View this record in MEDLINE/PubMed
BookMark eNptkcFvFCEUxompabe1J--Go4kZCwPDLB5MmsZWkybtoZ7JG2BnUQZGYJvszT9dtrs1ajxx-H7v-3jvO0VHIQaL0GtK3lMi2UWAdDF-B2hF9wItKBNtw6Voj9CCMNI1lPDlCTrN-RshlNOOH6MT1jK27Hm3QD8fIr63BYZtsRlDMHiw2xjMB5ystqFgMI8QdNVcwHOKAwzOu1ycfoKzGwP4nVCR7MKIwY8xubKe9m5lbV3CMM_eaSguBlwinmrgaEOcnM6v0MsV-GzPD-8Z-nr96eHqc3N7d_Pl6vK20ZzI0nQ9lUQYRsxgGe2lbJkkRAhgttfGSgmrnkszcKJ7SQcjuZVCa67bpQRDVuwMfdz7zpthsma3WwKv5uQmSFsVwam_leDWaoyPqm8FpYJXg7cHgxR_bGwuanJZW-8h2LjJqn6oazu-7Hbomz-zfoc8n70CdA_oFHNOdqW0K0_nqdHOK0rUrlpVq1WHauvMu39mnm3_R_8CC0-qsw
CitedBy_id crossref_primary_10_1038_s41592_024_02280_z
crossref_primary_10_1186_s13059_024_03414_4
crossref_primary_10_1038_s41467_022_33869_7
crossref_primary_10_1016_j_tcs_2023_114347
crossref_primary_10_1186_s13059_021_02297_z
crossref_primary_10_1016_j_eswa_2023_121443
crossref_primary_10_1186_s40537_024_00906_9
crossref_primary_10_1093_nar_gkae364
crossref_primary_10_1016_j_scitotenv_2023_165859
crossref_primary_10_1093_bioinformatics_btaf249
Cites_doi 10.1093/bioinformatics/btt336
10.1093/bioinformatics/btz354
10.1089/cmb.2016.0155
10.1038/s41587-018-0010-1
10.1186/1756-0500-5-123
10.1016/j.jalgor.2003.12.001
10.1101/852889
10.1038/nbt.3442
10.1371/journal.pcbi.1005727
10.1186/s40168-019-0653-2
10.1371/journal.pbio.0050077
10.1111/1462-2920.12086
10.1186/s13059-018-1568-0
10.1101/434795
10.1016/j.eswa.2018.01.014
10.1109/MSP.2007.4286571
10.1038/nrmicro2119
10.1145/1083784.1083789
10.1002/mrm.21391
10.1016/j.cell.2017.10.023
10.1093/bioinformatics/bth408
10.1145/2452376.2452456
10.1016/j.crma.2008.03.014
10.1007/s10994-006-6265-7
10.1109/TIT.2016.2556683
10.1186/gb-2014-15-3-r46
10.1371/journal.pone.0091784
10.1093/bioinformatics/btp324
10.1093/bioinformatics/btx235
10.1038/nmeth.3176
10.1073/pnas.1402564111
10.1186/s12859-017-1724-7
10.1007/978-3-540-87779-0_24
10.1007/978-3-642-40453-5_28
10.12688/f1000research.19675.1
10.1145/872757.872787
10.1038/s41467-019-10934-2
10.1137/090771806
10.1093/bib/bbz083
10.1186/s13059-019-1875-0
10.1093/bioinformatics/bty651
10.1016/j.acha.2008.07.002
10.1109/ICDM.2017.64
10.1017/CBO9780511814075
10.1016/j.dam.2018.03.035
10.1038/nature19366
10.1093/bioinformatics/bty567
10.1006/jcss.1997.1545
10.1007/s00041-008-9035-z
10.1038/s41587-019-0156-5
10.1093/bioinformatics/btx432
10.1109/TIT.2005.858979
10.1109/ICDMW.2010.18
10.1186/1471-2105-15-S9-S7
10.1023/A:1014091514039
10.1109/TSP.2012.2201149
10.1109/TIT.2007.909108
10.1109/ICASSP.2018.8461701
10.1109/TIT.2006.871582
10.1146/annurev-biodatasci-072018-021156
10.1186/s13059-016-0997-x
10.1093/bioinformatics/bty611
10.1038/nrg3433
10.1089/cmb.2018.0036
10.1145/2842602
10.1093/bioinformatics/btw397
10.1038/s41587-018-0006-x
10.1109/SFCS.2000.892127
10.1128/CMR.00013-11
10.1371/journal.pcbi.1005777
10.1016/j.knosys.2019.104987
10.1126/sciadv.1600025
10.1186/s12859-019-2918-y
10.1109/SFCS.1985.48
10.1093/nar/gkq1019
10.1186/s13059-019-1841-x
10.1145/2957324
10.1038/nature08821
10.1038/srep01968
10.1186/s13059-019-1891-0
10.21105/joss.00505
10.1371/journal.ppat.1005713
10.1093/nar/gky901
10.1186/s13059-019-1809-x
10.1111/2041-210X.12574
10.1038/s41576-019-0113-7
10.1145/362686.362692
10.1007/978-0-387-47534-9
10.1007/s00365-007-9003-x
10.1128/mSystems.00020-16
10.1016/j.jalgor.2003.12.002
10.1109/SFCS.1997.646139
10.1287/moor.4.3.233
10.1038/nature06244
10.1093/bioinformatics/btv683
10.12688/f1000research.6924.1
10.1186/s13742-015-0066-5
10.1093/nar/gki025
10.1007/978-3-319-56970-3_16
ContentType Journal Article
Copyright The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. 2020
Copyright_xml – notice: The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
– notice: The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. 2020
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOI 10.1093/nar/gkaa265
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE

CrossRef
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
EISSN 1362-4962
EndPage 5234
ExternalDocumentID PMC7261164
32338745
10_1093_nar_gkaa265
Genre Research Support, U.S. Gov't, Non-P.H.S
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NINDS NIH HHS
  grantid: R21 NS106640
– fundername: NLM NIH HHS
  grantid: T15 LM007093
– fundername: ;
– fundername: ;
  grantid: N00014-18-12571; N00014-17-1-2551
– fundername: ;
  grantid: G001534-7500
– fundername: ;
  grantid: W911NF-17-2-0089
– fundername: ;
  grantid: R21NS106640
– fundername: ;
  grantid: FA9550-18-1-0478
– fundername: ;
  grantid: T15LM007093
– fundername: ;
  grantid: N00014-18-1-2047
– fundername: ;
  grantid: CCF-1911094; IIS-1838177; IIS-1730574
GroupedDBID ---
-DZ
-~X
.I3
0R~
123
18M
1TH
29N
2WC
4.4
482
53G
5VS
5WA
70E
85S
A8Z
AAFWJ
AAHBH
AAMVS
AAOGV
AAPXW
AAUQX
AAVAP
AAYXX
ABEJV
ABGNP
ABPTD
ABQLI
ABXVV
ACGFO
ACGFS
ACIWK
ACNCT
ACPRK
ACUTJ
ADBBV
ADHZD
AEGXH
AENEX
AENZO
AFFNX
AFPKN
AFRAH
AFYAG
AHMBA
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ALUQC
AMNDL
AOIJS
BAWUL
BAYMD
BCNDV
CAG
CIDKT
CITATION
CS3
CZ4
DIK
DU5
D~K
E3Z
EBD
EBS
EMOBN
F5P
GROUPED_DOAJ
GX1
H13
HH5
HYE
HZ~
IH2
KAQDR
KQ8
KSI
OAWHX
OBC
OBS
OEB
OES
OJQWA
OVT
P2P
PEELM
PQQKQ
R44
RD5
RNS
ROL
ROZ
RPM
RXO
SV3
TN5
TOX
TR2
WG7
WOQ
X7H
XSB
YSK
ZKX
~91
~D7
~KM
CGR
CUY
CVF
ECM
EIF
NPM
7X8
ESTFP
5PM
ID FETCH-LOGICAL-c409t-571906d30dbe317992390066a3e7cde99af749db40c791bd94e96cc4c289ad0f3
ISICitedReferencesCount 22
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000569071800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0305-1048
1362-4962
IngestDate Tue Sep 30 16:36:51 EDT 2025
Mon Sep 08 17:27:19 EDT 2025
Mon Jul 21 06:00:18 EDT 2025
Sat Nov 29 03:25:02 EST 2025
Tue Nov 18 20:54:56 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
License http://creativecommons.org/licenses/by/4.0
The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c409t-571906d30dbe317992390066a3e7cde99af749db40c791bd94e96cc4c289ad0f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
These authors share senior authorship.
These authors contributed equally to this work and should be regarded as joint first authors.
ORCID 0000-0002-3760-564X
OpenAccessLink http://dx.doi.org/10.1093/nar/gkaa265
PMID 32338745
PQID 2395254854
PQPubID 23479
PageCount 18
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_7261164
proquest_miscellaneous_2395254854
pubmed_primary_32338745
crossref_citationtrail_10_1093_nar_gkaa265
crossref_primary_10_1093_nar_gkaa265
PublicationCentury 2000
PublicationDate 2020-06-04
PublicationDateYYYYMMDD 2020-06-04
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-06-04
  day: 04
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Nucleic acids research
PublicationTitleAlternate Nucleic Acids Res
PublicationYear 2020
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Marais (2020053012202488600_B90) 2019; 35
Blumensath (2020053012202488600_B37) 2008; 14
Dawson (2020053012202488600_B41) 2019; 20
Aghazadeh (2020053012202488600_B73) 2018
Vogel (2020053012202488600_B128) 2009; 7
Brown (2020053012202488600_B108) 2012
Zhao (2020053012202488600_B83) 2018; 35
Ondov (2020053012202488600_B94) 2019; 20
Rozov (2020053012202488600_B62) 2014; 15
Shrivastava (2020053012202488600_B43) 2017; 70
Broder (2020053012202488600_B31) 1997
Pellow (2020053012202488600_B63) 2017; 24
Piro (2020053012202488600_B101) 2019
Pruitt (2020053012202488600_B134) 2005; 33
Solomon (2020053012202488600_B92) 2017
Flajolet (2020053012202488600_B9) 2007; 2
Buchfink (2020053012202488600_B96) 2015; 12
Studer (2020053012202488600_B14) 2012
Baraniuk (2020053012202488600_B71) 2008; 28
Salvatore (2020053012202488600_B82) 2019
Candes (2020053012202488600_B17) 2005; 51
Koslicki (2020053012202488600_B110) 2014; 9
Indyk (2020053012202488600_B32) 1998
Gupta (2020053012202488600_B68) 2019
Limasset (2020053012202488600_B53) 2017
Amid (2020053012202488600_B131) 2019; 48
Marçais (2020053012202488600_B57) 2017; 33
Solomon (2020053012202488600_B91) 2016; 34
Aghazadeh (2020053012202488600_B113) 2018; 80
Denver (2020053012202488600_B122) 2016; 12
Candes (2020053012202488600_B70) 2008; 346
Needell (2020053012202488600_B38) 2009; 26
Murray (2020053012202488600_B88) 2017; 13
Mohamadi (2020053012202488600_B51) 2016; 32
Quedenfeld (2020053012202488600_B81) 2017
Shrivastava (2020053012202488600_B45) 2015
Cormode (2020053012202488600_B8) 2005; 55
Bovee (2020053012202488600_B85) 2018; 3
Wedemeyer (2020053012202488600_B109) 2017; 18
Charalampous (2020053012202488600_B78) 2019; 37
Singh (2020053012202488600_B1) 2019; 188
Chu (2020053012202488600_B99) 2018
Ondov (2020053012202488600_B40) 2016; 17
Gaber (2020053012202488600_B35) 2005; 34
Rusch (2020053012202488600_B124) 2007; 5
Aghazadeh (2020053012202488600_B119) 2016; 2
Charikar (2020053012202488600_B47) 2002
Leinonen (2020053012202488600_B132) 2010; 39
Shakya (2020053012202488600_B133) 2013; 15
Lin (2020053012202488600_B27) 2003
Dasgupta (2020053012202488600_B29) 2000
Broder (2020053012202488600_B6) 1997
Bradley (2020053012202488600_B93) 2019; 37
Creer (2020053012202488600_B107) 2016; 7
Roberts (2020053012202488600_B13) 2004; 20
Luo (2020053012202488600_B106) 2018; 35
Yang (2020053012202488600_B87) 2017
Baraniuk (2020053012202488600_B16) 2007; 24
Ilie (2020053012202488600_B100) 2012; 5
Metzler (2020053012202488600_B76) 2017
Berger (2020053012202488600_B2) 2016; 59
Halko (2020053012202488600_B25) 2011; 53
Dadi (2020053012202488600_B102) 2018; 34
Herlihy (2020053012202488600_B54) 2008
Wu (2020053012202488600_B75) 2019
Bloom (2020053012202488600_B10) 1970; 13
Marchet (2020053012202488600_B52) 2018; 274
Alon (2020053012202488600_B33) 1999; 58
Crusoe (2020053012202488600_B89) 2015; 4
Qin (2020053012202488600_B126) 2010; 464
Bonomi (2020053012202488600_B64) 2006
Koslicki (2020053012202488600_B111) 2013; 29
Koslicki (2020053012202488600_B112) 2016; 1
Mousavi (2020053012202488600_B74) 2019
Lustig (2020053012202488600_B15) 2007; 58
Berger (2020053012202488600_B4) 2013; 14
Donoho (2020053012202488600_B18) 2006; 52
Dai (2020053012202488600_B118) 2008; 2009
Liu (2020053012202488600_B105) 2017; 34
Kopf (2020053012202488600_B125) 2015; 4
Maillard (2020053012202488600_B30) 2012; 13
Vempala (2020053012202488600_B20) 2004
Mousavi (2020053012202488600_B77) 2017
Cohen (2020053012202488600_B65) 2003
Chen (2020053012202488600_B129) 2018; 47
Breitwieser (2020053012202488600_B97) 2018; 19
Shrivastava (2020053012202488600_B46) 2016
Katz (2020053012202488600_B80) 2017
Salikhov (2020053012202488600_B61) 2013
Baker (2020053012202488600_B84) 2019; 20
Tropp (2020053012202488600_B36) 2007; 53
Heule (2020053012202488600_B59) 2013
Pagh (2020053012202488600_B56) 2004; 51
Chiu (2020053012202488600_B79) 2019; 20
Chvatal (2020053012202488600_B117) 1979; 4
Dasgupta (2020053012202488600_B28) 2008
Metsky (2020053012202488600_B116) 2019; 37
Chabchoub (2020053012202488600_B60) 2010
Vervier (2020053012202488600_B114) 2015; 32
Drineas (2020053012202488600_B24) 2016; 59
Metzler (2020053012202488600_B39) 2016; 62
Bahadir (2020053012202488600_B72) 2019
Celis (2020053012202488600_B55) 1985
Gupta (2020053012202488600_B67) 2019
Ferragina (2020053012202488600_B12) 2000
Li (2020053012202488600_B49) 2011
Jain (2020053012202488600_B104) 2018; 25
Turnbaugh (2020053012202488600_B130) 2007; 449
Achlioptas (2020053012202488600_B21) 2001
Wood (2020053012202488600_B95) 2014; 15
Shrivastava (2020053012202488600_B48) 2014
Rowe (2020053012202488600_B5) 2019; 20
Li (2020053012202488600_B11) 2009; 25
Aggarwal (2020053012202488600_B34) 2007
Wood (2020053012202488600_B98) 2019; 20
Orenstein (2020053012202488600_B58) 2017; 13
Shrivastava (2020053012202488600_B44) 2014; 32
Rowe (2020053012202488600_B86) 2019; 7
Marçais (2020053012202488600_B3) 2019; 2
Indyk (2020053012202488600_B7) 1998
Ni (2020053012202488600_B115) 2013; 3
Howe (2020053012202488600_B123) 2014; 111
Pierce (2020053012202488600_B42) 2019; 8
Motwani (2020053012202488600_B19) 1995
Vempala (2020053012202488600_B22) 1997
Arriaga (2020053012202488600_B26) 2006; 63
Davenport (2020053012202488600_B121) 2012; 60
Roux (2020053012202488600_B127) 2016; 537
Coleman (2020053012202488600_B66) 2019
Domingo (2020053012202488600_B23) 2002; 6
Hassanian-esfahani (2020053012202488600_B50) 2018; 99
Dilthey (2020053012202488600_B103) 2019; 10
Cleary (2020053012202488600_B69) 2017; 171
Peters (2020053012202488600_B120) 2012; 25
References_xml – volume: 29
  start-page: 2096
  year: 2013
  ident: 2020053012202488600_B111
  article-title: Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btt336
– volume: 35
  start-page: i127
  year: 2019
  ident: 2020053012202488600_B90
  article-title: Locality-sensitive hashing for the edit distance
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btz354
– start-page: 537
  volume-title: Proceedings of the 40th Annual ACM Symposium on Theory of Computing
  year: 2008
  ident: 2020053012202488600_B28
  article-title: Random projection trees and low dimensional manifolds
– volume: 24
  start-page: 547
  year: 2017
  ident: 2020053012202488600_B63
  article-title: Improving Bloom filter performance on sequence data using k-mer Bloom filters
  publication-title: J. Comput. Biol.
  doi: 10.1089/cmb.2016.0155
– volume: 37
  start-page: 152
  year: 2019
  ident: 2020053012202488600_B93
  article-title: Ultrafast search of all deposited bacterial and viral genomic data
  publication-title: Nat. Biotechnol.
  doi: 10.1038/s41587-018-0010-1
– volume: 5
  start-page: 123
  year: 2012
  ident: 2020053012202488600_B100
  article-title: Efficient computation of spaced seeds
  publication-title: BMC. Res. Notes.
  doi: 10.1186/1756-0500-5-123
– volume: 55
  start-page: 58
  year: 2005
  ident: 2020053012202488600_B8
  article-title: An improved data stream summary: the count-min sketch and its applications (Vol. 31)
  publication-title: J. Algorithm.
  doi: 10.1016/j.jalgor.2003.12.001
– year: 2019
  ident: 2020053012202488600_B66
  article-title: Diversified RACE sampling on data streams applied to metagenomic sequence analysis
  doi: 10.1101/852889
– volume: 34
  start-page: 300
  year: 2016
  ident: 2020053012202488600_B91
  article-title: Fast search of thousands of short-read sequencing experiments
  publication-title: Nat. Biotechnol.
  doi: 10.1038/nbt.3442
– volume: 13
  start-page: e1005727
  year: 2017
  ident: 2020053012202488600_B88
  article-title: kWIP: the k-mer weighted inner product, a de novo estimator of genetic similarity
  publication-title: PLoS. Comput. Biol.
  doi: 10.1371/journal.pcbi.1005727
– volume: 7
  start-page: 40
  year: 2019
  ident: 2020053012202488600_B86
  article-title: Streaming histogram sketching for rapid microbiome analytics
  publication-title: Microbiome.
  doi: 10.1186/s40168-019-0653-2
– volume: 32
  start-page: 557
  year: 2014
  ident: 2020053012202488600_B44
  article-title: Densifying one permutation hashing via rotation for fast near neighbor search
  publication-title: Proceedings of the 31st International Conference on Machine Learning
– volume: 5
  start-page: e77
  year: 2007
  ident: 2020053012202488600_B124
  article-title: Oceanic metagenomics: the Sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific
  publication-title: PLoS. Biol.
  doi: 10.1371/journal.pbio.0050077
– volume: 15
  start-page: 1882
  year: 2013
  ident: 2020053012202488600_B133
  article-title: Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities
  publication-title: Environ. Microbiol.
  doi: 10.1111/1462-2920.12086
– start-page: 744
  volume-title: 2017 55th Annual Allerton Conference on Communication, Control, and Computing
  year: 2017
  ident: 2020053012202488600_B77
  article-title: DeepCodec: adaptive sensing and recovery via deep convolutional neural networks
– start-page: 812
  volume-title: Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence
  year: 2015
  ident: 2020053012202488600_B45
  article-title: Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS)
– volume: 48
  start-page: D70
  year: 2019
  ident: 2020053012202488600_B131
  article-title: The European Nucleotide Archive in 2019
  publication-title: Nucleic Acids Res.
– volume: 19
  start-page: 198
  year: 2018
  ident: 2020053012202488600_B97
  article-title: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts
  publication-title: Genome. Biol.
  doi: 10.1186/s13059-018-1568-0
– year: 2018
  ident: 2020053012202488600_B99
  article-title: Improving on hash-based probabilistic sequence classification using multiple spaced seeds and multi-index Bloom filters
  doi: 10.1101/434795
– volume: 99
  start-page: 203
  year: 2018
  ident: 2020053012202488600_B50
  article-title: Sectional minhash for near-duplicate detection
  publication-title: Expert. Syst. Appl.
  doi: 10.1016/j.eswa.2018.01.014
– volume: 24
  start-page: 118
  year: 2007
  ident: 2020053012202488600_B16
  article-title: Compressive sensing
  publication-title: IEEE. Signal. Proc. Mag.
  doi: 10.1109/MSP.2007.4286571
– volume: 7
  start-page: 252
  year: 2009
  ident: 2020053012202488600_B128
  article-title: TerraGenome: a consortium for the sequencing of a soil metagenome
  publication-title: Nat. Rev. Microbiol.
  doi: 10.1038/nrmicro2119
– volume: 34
  start-page: 18
  year: 2005
  ident: 2020053012202488600_B35
  article-title: Mining data streams: a review
  publication-title: Sigmod. Rec.
  doi: 10.1145/1083784.1083789
– volume: 58
  start-page: 1182
  year: 2007
  ident: 2020053012202488600_B15
  article-title: Sparse MRI: the application of compressed sensing for rapid MR imaging
  publication-title: Magn. Reson. Med.
  doi: 10.1002/mrm.21391
– volume: 171
  start-page: 1424
  year: 2017
  ident: 2020053012202488600_B69
  article-title: Efficient generation of transcriptomic profiles by random composite measurements
  publication-title: Cell.
  doi: 10.1016/j.cell.2017.10.023
– volume: 20
  start-page: 3363
  year: 2004
  ident: 2020053012202488600_B13
  article-title: Reducing storage requirements for biological sequence comparison
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/bth408
– start-page: 683
  volume-title: Proceedings of the 16th International Conference on Extending Database Technology
  year: 2013
  ident: 2020053012202488600_B59
  article-title: HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm
  doi: 10.1145/2452376.2452456
– volume: 346
  start-page: 589
  year: 2008
  ident: 2020053012202488600_B70
  article-title: The restricted isometry property and its implications for compressed sensing
  publication-title: C. R. Math.
  doi: 10.1016/j.crma.2008.03.014
– volume: 63
  start-page: 161
  year: 2006
  ident: 2020053012202488600_B26
  article-title: An algorithmic theory of learning: robust concepts and random projection
  publication-title: Mach. Learn.
  doi: 10.1007/s10994-006-6265-7
– volume: 62
  start-page: 5117
  year: 2016
  ident: 2020053012202488600_B39
  article-title: From denoising to compressed sensing
  publication-title: IEEE Trans. Inform. Theory.
  doi: 10.1109/TIT.2016.2556683
– volume: 15
  start-page: R46
  year: 2014
  ident: 2020053012202488600_B95
  article-title: Kraken: ultrafast metagenomic sequence classification using exact alignments
  publication-title: Genome. Biol.
  doi: 10.1186/gb-2014-15-3-r46
– volume: 9
  start-page: e91784
  year: 2014
  ident: 2020053012202488600_B110
  article-title: WGSQuikr: fast whole-genome shotgun metagenomic classification
  publication-title: PLoS. One.
  doi: 10.1371/journal.pone.0091784
– volume: 25
  start-page: 1754
  year: 2009
  ident: 2020053012202488600_B11
  article-title: Fast and accurate short read alignment with Burrows–Wheeler transform
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btp324
– volume: 33
  start-page: i110
  year: 2017
  ident: 2020053012202488600_B57
  article-title: Improving the performance of minimizers and winnowing schemes
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btx235
– volume: 12
  start-page: 59
  year: 2015
  ident: 2020053012202488600_B96
  article-title: Fast and sensitive protein alignment using DIAMOND
  publication-title: Nat. Methods.
  doi: 10.1038/nmeth.3176
– volume: 111
  start-page: 4904
  year: 2014
  ident: 2020053012202488600_B123
  article-title: Tackling soil diversity with the assembly of large, complex metagenomes
  publication-title: Proc. Natl. Acad. Sci. U.S.A.
  doi: 10.1073/pnas.1402564111
– volume: 18
  start-page: 324
  year: 2017
  ident: 2020053012202488600_B109
  article-title: An improved filtering algorithm for big read datasets and its application to single-cell assembly
  publication-title: BMC. Bioinformatics.
  doi: 10.1186/s12859-017-1724-7
– start-page: 350
  volume-title: International Symposium on Distributed Computing
  year: 2008
  ident: 2020053012202488600_B54
  article-title: Hopscotch hashing
  doi: 10.1007/978-3-540-87779-0_24
– start-page: 364
  volume-title: International Workshop on Algorithms in Bioinformatics
  year: 2013
  ident: 2020053012202488600_B61
  article-title: Using cascading Bloom filters to improve the memory usage for de Brujin graphs
  doi: 10.1007/978-3-642-40453-5_28
– volume: 8
  start-page: 1006
  year: 2019
  ident: 2020053012202488600_B42
  article-title: Large-scale sequence comparisons with sourmash [version 1; peer review: 2 approved]
  publication-title: F1000Research
  doi: 10.12688/f1000research.19675.1
– year: 2012
  ident: 2020053012202488600_B108
  article-title: A reference-free algorithm for computational normalization of shotgun sequencing data
– start-page: 241
  volume-title: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data
  year: 2003
  ident: 2020053012202488600_B65
  article-title: Spectral Bloom filters
  doi: 10.1145/872757.872787
– volume: 10
  start-page: 3066
  year: 2019
  ident: 2020053012202488600_B103
  article-title: Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps
  publication-title: Nat. Commun.
  doi: 10.1038/s41467-019-10934-2
– volume: 53
  start-page: 217
  year: 2011
  ident: 2020053012202488600_B25
  article-title: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions
  publication-title: SIAM. Rev.
  doi: 10.1137/090771806
– year: 2019
  ident: 2020053012202488600_B82
  article-title: Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis
  publication-title: Brief. Bioinform.
  doi: 10.1093/bib/bbz083
– volume: 2009
  start-page: 162824
  year: 2008
  ident: 2020053012202488600_B118
  article-title: Compressive sensing DNA microarrays
  publication-title: EURASIP J. Bioinform. Syst. Biol.
– start-page: 684
  volume-title: European Symposium on Algorithms
  year: 2006
  ident: 2020053012202488600_B64
  article-title: An improved construction for counting Bloom filters
– volume: 20
  start-page: 265
  year: 2019
  ident: 2020053012202488600_B84
  article-title: Dashing: fast and accurate genomic distances with HyperLogLog
  publication-title: Genome. Biol.
  doi: 10.1186/s13059-019-1875-0
– volume: 35
  start-page: 671
  year: 2018
  ident: 2020053012202488600_B83
  article-title: BinDash, software for fast genome distance estimation on a typical personal laptop
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/bty651
– volume: 26
  start-page: 301
  year: 2009
  ident: 2020053012202488600_B38
  article-title: CoSaMP: iterative signal recovery from incomplete and inaccurate samples
  publication-title: Appl. Comput. Harmon. A.
  doi: 10.1016/j.acha.2008.07.002
– start-page: 545
  volume-title: 2017 IEEE International Conference on Data Mining (ICDM)
  year: 2017
  ident: 2020053012202488600_B87
  article-title: Histosketch: fast similarity-preserving sketching of streaming histograms with concept drift
  doi: 10.1109/ICDM.2017.64
– start-page: E1679
  volume-title: Proceedings of the National Academy of Sciences
  year: 2012
  ident: 2020053012202488600_B14
  article-title: Compressive fluorescence microscopy for biological and hyperspectral imaging
– volume-title: Randomized Algorithms
  year: 1995
  ident: 2020053012202488600_B19
  doi: 10.1017/CBO9780511814075
– start-page: 21
  volume-title: Proceedings of the Compression and Complexity of Sequences
  year: 1997
  ident: 2020053012202488600_B31
  article-title: On the resemblance and containment of documents
– volume: 274
  start-page: 92
  year: 2018
  ident: 2020053012202488600_B52
  article-title: A resource-frugal probabilistic dictionary and applications in bioinformatics
  publication-title: Discrete. Appl. Math.
  doi: 10.1016/j.dam.2018.03.035
– volume: 537
  start-page: 689
  year: 2016
  ident: 2020053012202488600_B127
  article-title: Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses
  publication-title: Nature
  doi: 10.1038/nature19366
– volume: 34
  start-page: i766
  year: 2018
  ident: 2020053012202488600_B102
  article-title: DREAM-Yara: an exact read mapper for very large databases with short update time
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/bty567
– volume: 2
  start-page: 137
  year: 2007
  ident: 2020053012202488600_B9
  article-title: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm
  publication-title: Discrete. Math. Theor.
– volume: 58
  start-page: 137
  year: 1999
  ident: 2020053012202488600_B33
  article-title: The space complexity of approximating the frequency moments
  publication-title: J. Comput. Syst. Sci.
  doi: 10.1006/jcss.1997.1545
– volume: 14
  start-page: 629
  year: 2008
  ident: 2020053012202488600_B37
  article-title: Iterative thresholding for sparse approximations
  publication-title: J. Fourier. Anal. Appl.
  doi: 10.1007/s00041-008-9035-z
– volume: 37
  start-page: 783
  year: 2019
  ident: 2020053012202488600_B78
  article-title: Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection
  publication-title: Nat. Biotechnol.
  doi: 10.1038/s41587-019-0156-5
– year: 2019
  ident: 2020053012202488600_B101
  article-title: Ganon: precise metagenomics classification against large and up-to-date sets of reference sequences
– volume: 34
  start-page: 171
  year: 2017
  ident: 2020053012202488600_B105
  article-title: A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btx432
– volume: 51
  start-page: 4203
  year: 2005
  ident: 2020053012202488600_B17
  article-title: Decoding by linear programming
  publication-title: IEEE. T. Inform. Theory
  doi: 10.1109/TIT.2005.858979
– start-page: 1498
  volume-title: Proceedings of the 30th International Conference on Neural Information Processing Systems
  year: 2016
  ident: 2020053012202488600_B46
  article-title: Simple and efficient weighted minwise hashing
– start-page: 335
  volume-title: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic
  year: 2001
  ident: 2020053012202488600_B21
  article-title: Sampling techniques for kernel methods
– start-page: 8
  volume-title: Poster presented at: American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines
  year: 2017
  ident: 2020053012202488600_B80
  article-title: Generating WGS trees with Mashtree
– start-page: 1297
  volume-title: 2010 IEEE International Conference on Data Mining Workshops
  year: 2010
  ident: 2020053012202488600_B60
  article-title: Sliding hyperloglog: estimating cardinality in a data stream over a sliding window
  doi: 10.1109/ICDMW.2010.18
– volume: 15
  start-page: S7
  year: 2014
  ident: 2020053012202488600_B62
  article-title: Fast lossless compression via cascading Bloom filters
  publication-title: BMC. Bioinformatics.
  doi: 10.1186/1471-2105-15-S9-S7
– volume: 6
  start-page: 131
  year: 2002
  ident: 2020053012202488600_B23
  article-title: Adaptive sampling methods for scaling up knowledge discovery algorithms
  publication-title: Data. Min. Knowl. Disc.
  doi: 10.1023/A:1014091514039
– start-page: 1770
  volume-title: Proceedings of the 31st International Conference on Neural Information Processing Systems
  year: 2017
  ident: 2020053012202488600_B76
  article-title: Learned D-AMP: principled neural network based compressive image recovery
– volume: 60
  start-page: 4628
  year: 2012
  ident: 2020053012202488600_B121
  article-title: The pros and cons of compressive sensing for wideband signal acquisition: noise folding versus dynamic range
  publication-title: IEEE Trans. Signal. Proces.
  doi: 10.1109/TSP.2012.2201149
– volume: 53
  start-page: 4655
  year: 2007
  ident: 2020053012202488600_B36
  article-title: Signal recovery from random measurements via orthogonal matching pursuit
  publication-title: IEEE Trans. Inform. Theory.
  doi: 10.1109/TIT.2007.909108
– start-page: 4689
  volume-title: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  year: 2018
  ident: 2020053012202488600_B73
  article-title: Insense: incoherent sensor selection for sparse signals
  doi: 10.1109/ICASSP.2018.8461701
– volume: 52
  start-page: 1289
  year: 2006
  ident: 2020053012202488600_B18
  article-title: Compressed sensing
  publication-title: IEEE Trans. Inform. Theory.
  doi: 10.1109/TIT.2006.871582
– volume: 2
  start-page: 93
  year: 2019
  ident: 2020053012202488600_B3
  article-title: Sketching and sublinear data structures in genomics
  publication-title: Annu. Rev. Biomed. Data Sci.
  doi: 10.1146/annurev-biodatasci-072018-021156
– volume: 17
  start-page: 132
  year: 2016
  ident: 2020053012202488600_B40
  article-title: Mash: fast genome and metagenome distance estimation using MinHash
  publication-title: Genome. Biol.
  doi: 10.1186/s13059-016-0997-x
– volume: 70
  start-page: 3154
  year: 2017
  ident: 2020053012202488600_B43
  article-title: Optimal densification for fast and accurate minwise hashing
  publication-title: Proceedings of the 34th International Conference on Machine Learning
– year: 2019
  ident: 2020053012202488600_B68
  article-title: Sub-linear sequence search via a Repeated And Merged Bloom Filter (RAMBO): indexing 170 TB data in 14 hours
– volume: 35
  start-page: 219
  year: 2018
  ident: 2020053012202488600_B106
  article-title: Metagenomic binning through low-density hashing
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/bty611
– volume: 14
  start-page: 333
  year: 2013
  ident: 2020053012202488600_B4
  article-title: Computational solutions for omics data
  publication-title: Nat. Rev. Genet.
  doi: 10.1038/nrg3433
– volume: 25
  start-page: 766
  year: 2018
  ident: 2020053012202488600_B104
  article-title: A fast approximate algorithm for mapping long reads to large reference databases
  publication-title: J. Comput. Biol.
  doi: 10.1089/cmb.2018.0036
– volume: 59
  start-page: 80
  year: 2016
  ident: 2020053012202488600_B24
  article-title: RandNLA: randomized numerical linear algebra
  publication-title: Commun. Acm.
  doi: 10.1145/2842602
– volume: 32
  start-page: 3492
  year: 2016
  ident: 2020053012202488600_B51
  article-title: ntHash: recursive nucleotide hashing
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btw397
– volume: 37
  start-page: 160
  year: 2019
  ident: 2020053012202488600_B116
  article-title: Capturing sequence diversity in metagenomes with comprehensive and scalable probe design
  publication-title: Nat. Biotechnol.
  doi: 10.1038/s41587-018-0006-x
– start-page: 390
  volume-title: Proceedings 41st Annual Symposium on Foundations of Computer Science
  year: 2000
  ident: 2020053012202488600_B12
  article-title: Opportunistic data structures with applications
  doi: 10.1109/SFCS.2000.892127
– year: 2019
  ident: 2020053012202488600_B72
  article-title: Adaptive compressed sensing MRI with unsupervised learning
– volume: 25
  start-page: 193
  year: 2012
  ident: 2020053012202488600_B120
  article-title: Polymicrobial Interactions: impact on Pathogenesis and Human Disease
  publication-title: Clin. Microbiol. Rev.
  doi: 10.1128/CMR.00013-11
– volume: 13
  start-page: e1005777
  year: 2017
  ident: 2020053012202488600_B58
  article-title: Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing
  publication-title: PLoS. Comput. Biol.
  doi: 10.1371/journal.pcbi.1005777
– volume: 188
  start-page: 104987
  year: 2019
  ident: 2020053012202488600_B1
  article-title: Probabilistic data structures for big data analytics: A comprehensive review
  publication-title: Knowl.-Based. Syst.
  doi: 10.1016/j.knosys.2019.104987
– volume: 2
  start-page: e1600025
  year: 2016
  ident: 2020053012202488600_B119
  article-title: Universal microbial diagnostics using random DNA probes
  publication-title: Sci. Adv.
  doi: 10.1126/sciadv.1600025
– volume: 13
  start-page: 2735
  year: 2012
  ident: 2020053012202488600_B30
  article-title: Linear regression with random projections
  publication-title: J. Mach. Learn. Res.
– volume: 20
  start-page: 389
  year: 2019
  ident: 2020053012202488600_B41
  article-title: Viral coinfection analysis using a MinHash toolkit
  publication-title: BMC. Bioinformatics.
  doi: 10.1186/s12859-019-2918-y
– start-page: 380
  volume-title: Proceedings of the 34th Annual ACM Symposium on Theory of Computing
  year: 2002
  ident: 2020053012202488600_B47
  article-title: Similarity estimation techniques from rounding algorithms
– start-page: 281
  volume-title: Proceedings 26th Annual Symposium on Foundations of Computer Science (sfcs 1985)
  year: 1985
  ident: 2020053012202488600_B55
  article-title: Robin hood hashing
  doi: 10.1109/SFCS.1985.48
– volume: 39
  start-page: D19
  year: 2010
  ident: 2020053012202488600_B132
  article-title: The sequence read archive
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkq1019
– start-page: 604
  volume-title: Proceedings of the 30th Annual ACM Symposium on Theory of Computing
  year: 1998
  ident: 2020053012202488600_B32
  article-title: Approximate nearest neighbors: towards removing the curse of dimensionality
– start-page: 143
  volume-title: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
  year: 2000
  ident: 2020053012202488600_B29
  article-title: Experiments with random projection
– volume: 20
  start-page: 232
  year: 2019
  ident: 2020053012202488600_B94
  article-title: Mash Screen: high-throughput sequence containment estimation for genome discovery
  publication-title: Genome. Biol.
  doi: 10.1186/s13059-019-1841-x
– volume: 59
  start-page: 72
  year: 2016
  ident: 2020053012202488600_B2
  article-title: Computational biology in the 21st century: Scaling with compressive algorithms
  publication-title: Commun. Acm.
  doi: 10.1145/2957324
– year: 2017
  ident: 2020053012202488600_B81
  article-title: Variant tolerant read mapping using min-hashing
– volume-title: The Random Projection Method (Vol. 65)
  year: 2004
  ident: 2020053012202488600_B20
– volume: 80
  start-page: 80
  year: 2018
  ident: 2020053012202488600_B113
  article-title: MISSION: ultra large-scale feature selection using count-sketches
  publication-title: Proceedings of the 35th International Conference on Machine Learning
– volume: 464
  start-page: 59
  year: 2010
  ident: 2020053012202488600_B126
  article-title: A human gut microbial gene catalogue established by metagenomic sequencing
  publication-title: Nature
  doi: 10.1038/nature08821
– volume: 3
  start-page: 1968
  year: 2013
  ident: 2020053012202488600_B115
  article-title: How much metagenomic sequencing is enough to achieve a given goal?
  publication-title: Sci. Rep.-UK.
  doi: 10.1038/srep01968
– year: 2019
  ident: 2020053012202488600_B67
  article-title: RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in sub-linear time
– year: 2019
  ident: 2020053012202488600_B75
  article-title: The sparse recovery autoencoder
– volume: 20
  start-page: 257
  year: 2019
  ident: 2020053012202488600_B98
  article-title: Improved metagenomic analysis with Kraken 2
  publication-title: Genome. Biol.
  doi: 10.1186/s13059-019-1891-0
– volume-title: Proceedings of the Text Mining Workshop, at the 3rd SIAM International Conference on Data Mining
  year: 2003
  ident: 2020053012202488600_B27
  article-title: Dimensionality reduction by random projection and latent semantic indexing
– volume: 3
  start-page: 505
  year: 2018
  ident: 2020053012202488600_B85
  article-title: Finch: a tool adding dynamic abundance filtering to genomic MinHashing
  publication-title: J. Open Source Softw.
  doi: 10.21105/joss.00505
– volume: 12
  start-page: e1005713
  year: 2016
  ident: 2020053012202488600_B122
  article-title: Genome skimming: a rapid approach to gaining diverse biological insights into multicellular pathogens
  publication-title: PLoS. Pathog.
  doi: 10.1371/journal.ppat.1005713
– volume: 47
  start-page: D666
  year: 2018
  ident: 2020053012202488600_B129
  article-title: IMG/M v. 5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gky901
– volume: 20
  start-page: 199
  year: 2019
  ident: 2020053012202488600_B5
  article-title: When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data
  publication-title: Genome. Biol.
  doi: 10.1186/s13059-019-1809-x
– volume: 7
  start-page: 1008
  year: 2016
  ident: 2020053012202488600_B107
  article-title: The ecologist’s field guide to sequence-based identification of biodiversity
  publication-title: Methods. Ecol. Evol.
  doi: 10.1111/2041-210X.12574
– start-page: 2672
  volume-title: Proceedings of the 24th International Conference on Neural Information Processing Systems
  year: 2011
  ident: 2020053012202488600_B49
  article-title: Hashing algorithms for large-scale learning
– volume: 20
  start-page: 341
  year: 2019
  ident: 2020053012202488600_B79
  article-title: Clinical metagenomics
  publication-title: Nat. Rev. Genet.
  doi: 10.1038/s41576-019-0113-7
– start-page: 604
  volume-title: Proceedings of the 30th Annual ACM Symposium on Theory of Computing
  year: 1998
  ident: 2020053012202488600_B7
  article-title: Approximate nearest neighbors: towards removing the curse of dimensionality
– volume: 13
  start-page: 422
  year: 1970
  ident: 2020053012202488600_B10
  article-title: Space/time trade-offs in hash coding with allowable errors
  publication-title: Commun. Acm.
  doi: 10.1145/362686.362692
– volume-title: Data Streams: Models and Algorithms (Vol. 31)
  year: 2007
  ident: 2020053012202488600_B34
  doi: 10.1007/978-0-387-47534-9
– volume: 28
  start-page: 253
  year: 2008
  ident: 2020053012202488600_B71
  article-title: A simple proof of the restricted isometry property for random matrices
  publication-title: Constr. Approx.
  doi: 10.1007/s00365-007-9003-x
– volume: 1
  start-page: e00020-16
  year: 2016
  ident: 2020053012202488600_B112
  article-title: MetaPalette: a k-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation
  publication-title: MSystems
  doi: 10.1128/mSystems.00020-16
– start-page: 1
  volume-title: 16th International Symposium on Experimental Algorithms
  year: 2017
  ident: 2020053012202488600_B53
  article-title: Fast and scalable minimal perfect hashing for massive key sets
– start-page: 21
  volume-title: Proceedings of the Compression and Complexity of Sequences
  year: 1997
  ident: 2020053012202488600_B6
  article-title: On the resemblance and containment of documents
– volume: 51
  start-page: 122
  year: 2004
  ident: 2020053012202488600_B56
  article-title: Cuckoo hashing
  publication-title: J. Algorithm.
  doi: 10.1016/j.jalgor.2003.12.002
– start-page: 508
  volume-title: Proceedings 38th Annual Symposium on Foundations of Computer Science
  year: 1997
  ident: 2020053012202488600_B22
  article-title: A random sampling based algorithm for learning the intersection of half-spaces
  doi: 10.1109/SFCS.1997.646139
– start-page: 886
  volume-title: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics
  year: 2014
  ident: 2020053012202488600_B48
  article-title: In defense of minhash over simhash
– volume-title: 7th International Conference on Learning Representations
  year: 2019
  ident: 2020053012202488600_B74
  article-title: A data-driven and distributed approach to sparse signal representation and recovery
– volume: 4
  start-page: 233
  year: 1979
  ident: 2020053012202488600_B117
  article-title: A greedy heuristic for the set-covering problem
  publication-title: Math. Oper. Res.
  doi: 10.1287/moor.4.3.233
– volume: 449
  start-page: 804
  year: 2007
  ident: 2020053012202488600_B130
  article-title: The human microbiome project
  publication-title: Nature
  doi: 10.1038/nature06244
– volume: 32
  start-page: 1023
  year: 2015
  ident: 2020053012202488600_B114
  article-title: Large-scale machine learning for metagenomics sequence classification
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btv683
– volume: 4
  start-page: 900
  year: 2015
  ident: 2020053012202488600_B89
  article-title: The khmer software package: enabling efficient nucleotide sequence analysis [version 1; peer review: 2 approved, 1 approved with reservations]
  publication-title: F1000Research
  doi: 10.12688/f1000research.6924.1
– volume: 4
  start-page: 27
  year: 2015
  ident: 2020053012202488600_B125
  article-title: The ocean sampling day consortium
  publication-title: Gigascience
  doi: 10.1186/s13742-015-0066-5
– volume: 33
  start-page: D501
  year: 2005
  ident: 2020053012202488600_B134
  article-title: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gki025
– start-page: 257
  volume-title: International Conference on Research in Computational Molecular Biology
  year: 2017
  ident: 2020053012202488600_B92
  article-title: Improved search of large transcriptomic sequencing databases using split sequence bloom trees
  doi: 10.1007/978-3-319-56970-3_16
SSID ssj0014154
Score 2.5071652
SecondaryResourceType review_article
Snippet As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with...
SourceID pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage 5217
SubjectTerms Algorithms
Humans
Metagenome - genetics
Metagenomics - methods
Probability
Signal Processing, Computer-Assisted
Survey and Summary
Title To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
URI https://www.ncbi.nlm.nih.gov/pubmed/32338745
https://www.proquest.com/docview/2395254854
https://pubmed.ncbi.nlm.nih.gov/PMC7261164
Volume 48
WOSCitedRecordID wos000569071800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ: Directory of Open Access Journal (DOAJ)
  customDbUrl:
  eissn: 1362-4962
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014154
  issn: 0305-1048
  databaseCode: DOA
  dateStart: 20050101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1362-4962
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014154
  issn: 0305-1048
  databaseCode: TOX
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Pb9MwFLa6gWAXBBuD8mMy0sQBFC1xnLjmNk1MHNAYqEi9RY7tboE2qdJs2m78E_y_PNtxmo4hwYFLVDWO3eb74jy_971nhPZJKDlhOgkUVyqgqRZBHuU0IJIyniTTUDG32QQ7ORlNJvx0MPjpc2EuZ6wsR1dXfPFfoYbvAGyTOvsPcHedwhfwGUCHI8AOx78DvjKiW5FfG4-qcYvnNknFLP1hdrOSchf3dwryGh5oI5Bd-sqtRtFh0rNcBoHNYZydVXXRnM-XXm9Z1G97kW9jv85hSFPvde7V8998Xi_8PNOzLJSJUPScZ1Yx0kkUv8AU9VFXKwe_m4Q-F90roXKG7qkwKVedd9YETHTt3Lg-yNV6MUho1VbOk6DdzGvTt_j61ExHfQqGvYkWrA526xvAVccqjTr9-Oy7EMRtRdEDfjG3yMcEVufM1bK8UXLbn9pAdwhLuFEKjj9NuuAU2Dy0TfWE0Q5grIN2pC10z1-7buf8tni5qcHtGTXjh-hBuxrBh45Fj9BAl9to57AUTTW_xq-x1QfbwMs2un_k9wbcQT_GFe5IhoEU2JHsHXYUw55iuCjxGsVsY0cxvKIYXlHMNrAUwz2K4abCfYo9Rl-P34-PPgTtXh6BpCFvgoSB5ZmqOFS5jk0VQhJzY-6KWDOpNOdiyihXOQ0l41GuONU8lZJKMuJChdN4F22WVamfIqxCSaJQExWlGi7RObyComkswBAWeS70EL3xdz6TbaF7s9_KLHOCizgDxLIWsSHa7xovXH2X25u98hBmcKtNUE2UurpYZvA_EgLL_oQO0RMHadeR58IQsTWwuwamtvv6mbI4tzXeGUmjKKXP_tjnc7S1eo5eoM2mvtAv0V152RTLeg9tsMlozzqY9ix5fwEbDcoA
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=To+Petabytes+and+beyond%3A+recent+advances+in+probabilistic+and+signal+processing+algorithms+and+their+application+to+metagenomics&rft.jtitle=Nucleic+acids+research&rft.au=Elworth%2C+R+A+Leo&rft.au=Wang%2C+Qi&rft.au=Kota%2C+Pavan+K&rft.au=Barberan%2C+C+J&rft.date=2020-06-04&rft.eissn=1362-4962&rft.volume=48&rft.issue=10&rft.spage=5217&rft_id=info:doi/10.1093%2Fnar%2Fgkaa265&rft_id=info%3Apmid%2F32338745&rft.externalDocID=32338745
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0305-1048&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0305-1048&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0305-1048&client=summon