To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied...
Gespeichert in:
| Veröffentlicht in: | Nucleic acids research Jg. 48; H. 10; S. 5217 - 5234 |
|---|---|
| Hauptverfasser: | , , , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
Oxford University Press
04.06.2020
|
| Schlagworte: | |
| ISSN: | 0305-1048, 1362-4962, 1362-4962 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions. |
|---|---|
| AbstractList | As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions. As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions. |
| Author | Kota, Pavan K Gupta, Gaurav Coleman, Benjamin Balaji, Advait Wang, Qi Shrivastava, Anshumali Treangen, Todd J Baraniuk, Richard G Elworth, R A Leo Barberan, C J |
| AuthorAffiliation | 1 Department of Computer Science , Houston, TX 77005, USA 2 Systems, Synthetic, and Physical Biology (SSPB) Graduate Program , Houston, TX 77005, USA 4 Department of Electrical and Computer Engineering, Rice University , Houston, TX 77005, USA 3 Department of Bioengineering , Houston, TX 77005, USA |
| AuthorAffiliation_xml | – name: 4 Department of Electrical and Computer Engineering, Rice University , Houston, TX 77005, USA – name: 1 Department of Computer Science , Houston, TX 77005, USA – name: 3 Department of Bioengineering , Houston, TX 77005, USA – name: 2 Systems, Synthetic, and Physical Biology (SSPB) Graduate Program , Houston, TX 77005, USA |
| Author_xml | – sequence: 1 givenname: R A Leo surname: Elworth fullname: Elworth, R A Leo organization: Department of Computer Science, Houston, TX 77005, USA – sequence: 2 givenname: Qi surname: Wang fullname: Wang, Qi organization: Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA – sequence: 3 givenname: Pavan K surname: Kota fullname: Kota, Pavan K organization: Department of Bioengineering, Houston, TX 77005, USA – sequence: 4 givenname: C J surname: Barberan fullname: Barberan, C J organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA – sequence: 5 givenname: Benjamin surname: Coleman fullname: Coleman, Benjamin organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA – sequence: 6 givenname: Advait surname: Balaji fullname: Balaji, Advait organization: Department of Computer Science, Houston, TX 77005, USA – sequence: 7 givenname: Gaurav surname: Gupta fullname: Gupta, Gaurav organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA – sequence: 8 givenname: Richard G surname: Baraniuk fullname: Baraniuk, Richard G organization: Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA – sequence: 9 givenname: Anshumali surname: Shrivastava fullname: Shrivastava, Anshumali organization: Department of Computer Science, Houston, TX 77005, USA, Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA – sequence: 10 givenname: Todd J orcidid: 0000-0002-3760-564X surname: Treangen fullname: Treangen, Todd J organization: Department of Computer Science, Houston, TX 77005, USA, Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/32338745$$D View this record in MEDLINE/PubMed |
| BookMark | eNptkcFvFCEUxompabe1J--Go4kZCwPDLB5MmsZWkybtoZ7JG2BnUQZGYJvszT9dtrs1ajxx-H7v-3jvO0VHIQaL0GtK3lMi2UWAdDF-B2hF9wItKBNtw6Voj9CCMNI1lPDlCTrN-RshlNOOH6MT1jK27Hm3QD8fIr63BYZtsRlDMHiw2xjMB5ystqFgMI8QdNVcwHOKAwzOu1ycfoKzGwP4nVCR7MKIwY8xubKe9m5lbV3CMM_eaSguBlwinmrgaEOcnM6v0MsV-GzPD-8Z-nr96eHqc3N7d_Pl6vK20ZzI0nQ9lUQYRsxgGe2lbJkkRAhgttfGSgmrnkszcKJ7SQcjuZVCa67bpQRDVuwMfdz7zpthsma3WwKv5uQmSFsVwam_leDWaoyPqm8FpYJXg7cHgxR_bGwuanJZW-8h2LjJqn6oazu-7Hbomz-zfoc8n70CdA_oFHNOdqW0K0_nqdHOK0rUrlpVq1WHauvMu39mnm3_R_8CC0-qsw |
| CitedBy_id | crossref_primary_10_1038_s41592_024_02280_z crossref_primary_10_1186_s13059_024_03414_4 crossref_primary_10_1038_s41467_022_33869_7 crossref_primary_10_1016_j_tcs_2023_114347 crossref_primary_10_1186_s13059_021_02297_z crossref_primary_10_1016_j_eswa_2023_121443 crossref_primary_10_1186_s40537_024_00906_9 crossref_primary_10_1093_nar_gkae364 crossref_primary_10_1016_j_scitotenv_2023_165859 crossref_primary_10_1093_bioinformatics_btaf249 |
| Cites_doi | 10.1093/bioinformatics/btt336 10.1093/bioinformatics/btz354 10.1089/cmb.2016.0155 10.1038/s41587-018-0010-1 10.1186/1756-0500-5-123 10.1016/j.jalgor.2003.12.001 10.1101/852889 10.1038/nbt.3442 10.1371/journal.pcbi.1005727 10.1186/s40168-019-0653-2 10.1371/journal.pbio.0050077 10.1111/1462-2920.12086 10.1186/s13059-018-1568-0 10.1101/434795 10.1016/j.eswa.2018.01.014 10.1109/MSP.2007.4286571 10.1038/nrmicro2119 10.1145/1083784.1083789 10.1002/mrm.21391 10.1016/j.cell.2017.10.023 10.1093/bioinformatics/bth408 10.1145/2452376.2452456 10.1016/j.crma.2008.03.014 10.1007/s10994-006-6265-7 10.1109/TIT.2016.2556683 10.1186/gb-2014-15-3-r46 10.1371/journal.pone.0091784 10.1093/bioinformatics/btp324 10.1093/bioinformatics/btx235 10.1038/nmeth.3176 10.1073/pnas.1402564111 10.1186/s12859-017-1724-7 10.1007/978-3-540-87779-0_24 10.1007/978-3-642-40453-5_28 10.12688/f1000research.19675.1 10.1145/872757.872787 10.1038/s41467-019-10934-2 10.1137/090771806 10.1093/bib/bbz083 10.1186/s13059-019-1875-0 10.1093/bioinformatics/bty651 10.1016/j.acha.2008.07.002 10.1109/ICDM.2017.64 10.1017/CBO9780511814075 10.1016/j.dam.2018.03.035 10.1038/nature19366 10.1093/bioinformatics/bty567 10.1006/jcss.1997.1545 10.1007/s00041-008-9035-z 10.1038/s41587-019-0156-5 10.1093/bioinformatics/btx432 10.1109/TIT.2005.858979 10.1109/ICDMW.2010.18 10.1186/1471-2105-15-S9-S7 10.1023/A:1014091514039 10.1109/TSP.2012.2201149 10.1109/TIT.2007.909108 10.1109/ICASSP.2018.8461701 10.1109/TIT.2006.871582 10.1146/annurev-biodatasci-072018-021156 10.1186/s13059-016-0997-x 10.1093/bioinformatics/bty611 10.1038/nrg3433 10.1089/cmb.2018.0036 10.1145/2842602 10.1093/bioinformatics/btw397 10.1038/s41587-018-0006-x 10.1109/SFCS.2000.892127 10.1128/CMR.00013-11 10.1371/journal.pcbi.1005777 10.1016/j.knosys.2019.104987 10.1126/sciadv.1600025 10.1186/s12859-019-2918-y 10.1109/SFCS.1985.48 10.1093/nar/gkq1019 10.1186/s13059-019-1841-x 10.1145/2957324 10.1038/nature08821 10.1038/srep01968 10.1186/s13059-019-1891-0 10.21105/joss.00505 10.1371/journal.ppat.1005713 10.1093/nar/gky901 10.1186/s13059-019-1809-x 10.1111/2041-210X.12574 10.1038/s41576-019-0113-7 10.1145/362686.362692 10.1007/978-0-387-47534-9 10.1007/s00365-007-9003-x 10.1128/mSystems.00020-16 10.1016/j.jalgor.2003.12.002 10.1109/SFCS.1997.646139 10.1287/moor.4.3.233 10.1038/nature06244 10.1093/bioinformatics/btv683 10.12688/f1000research.6924.1 10.1186/s13742-015-0066-5 10.1093/nar/gki025 10.1007/978-3-319-56970-3_16 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. 2020 |
| Copyright_xml | – notice: The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. – notice: The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. 2020 |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM |
| DOI | 10.1093/nar/gkaa265 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE CrossRef |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Anatomy & Physiology Chemistry |
| EISSN | 1362-4962 |
| EndPage | 5234 |
| ExternalDocumentID | PMC7261164 32338745 10_1093_nar_gkaa265 |
| Genre | Research Support, U.S. Gov't, Non-P.H.S Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NINDS NIH HHS grantid: R21 NS106640 – fundername: NLM NIH HHS grantid: T15 LM007093 – fundername: ; – fundername: ; grantid: N00014-18-12571; N00014-17-1-2551 – fundername: ; grantid: G001534-7500 – fundername: ; grantid: W911NF-17-2-0089 – fundername: ; grantid: R21NS106640 – fundername: ; grantid: FA9550-18-1-0478 – fundername: ; grantid: T15LM007093 – fundername: ; grantid: N00014-18-1-2047 – fundername: ; grantid: CCF-1911094; IIS-1838177; IIS-1730574 |
| GroupedDBID | --- -DZ -~X .I3 0R~ 123 18M 1TH 29N 2WC 4.4 482 53G 5VS 5WA 70E 85S A8Z AAFWJ AAHBH AAMVS AAOGV AAPXW AAUQX AAVAP AAYXX ABEJV ABGNP ABPTD ABQLI ABXVV ACGFO ACGFS ACIWK ACNCT ACPRK ACUTJ ADBBV ADHZD AEGXH AENEX AENZO AFFNX AFPKN AFRAH AFYAG AHMBA AIAGR ALMA_UNASSIGNED_HOLDINGS ALUQC AMNDL AOIJS BAWUL BAYMD BCNDV CAG CIDKT CITATION CS3 CZ4 DIK DU5 D~K E3Z EBD EBS EMOBN F5P GROUPED_DOAJ GX1 H13 HH5 HYE HZ~ IH2 KAQDR KQ8 KSI OAWHX OBC OBS OEB OES OJQWA OVT P2P PEELM PQQKQ R44 RD5 RNS ROL ROZ RPM RXO SV3 TN5 TOX TR2 WG7 WOQ X7H XSB YSK ZKX ~91 ~D7 ~KM CGR CUY CVF ECM EIF NPM 7X8 ESTFP 5PM |
| ID | FETCH-LOGICAL-c409t-571906d30dbe317992390066a3e7cde99af749db40c791bd94e96cc4c289ad0f3 |
| ISICitedReferencesCount | 22 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000569071800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0305-1048 1362-4962 |
| IngestDate | Tue Sep 30 16:36:51 EDT 2025 Mon Sep 08 17:27:19 EDT 2025 Mon Jul 21 06:00:18 EDT 2025 Sat Nov 29 03:25:02 EST 2025 Tue Nov 18 20:54:56 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Language | English |
| License | http://creativecommons.org/licenses/by/4.0 The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c409t-571906d30dbe317992390066a3e7cde99af749db40c791bd94e96cc4c289ad0f3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 These authors share senior authorship. These authors contributed equally to this work and should be regarded as joint first authors. |
| ORCID | 0000-0002-3760-564X |
| OpenAccessLink | http://dx.doi.org/10.1093/nar/gkaa265 |
| PMID | 32338745 |
| PQID | 2395254854 |
| PQPubID | 23479 |
| PageCount | 18 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_7261164 proquest_miscellaneous_2395254854 pubmed_primary_32338745 crossref_citationtrail_10_1093_nar_gkaa265 crossref_primary_10_1093_nar_gkaa265 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-06-04 |
| PublicationDateYYYYMMDD | 2020-06-04 |
| PublicationDate_xml | – month: 06 year: 2020 text: 2020-06-04 day: 04 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Nucleic acids research |
| PublicationTitleAlternate | Nucleic Acids Res |
| PublicationYear | 2020 |
| Publisher | Oxford University Press |
| Publisher_xml | – name: Oxford University Press |
| References | Marais (2020053012202488600_B90) 2019; 35 Blumensath (2020053012202488600_B37) 2008; 14 Dawson (2020053012202488600_B41) 2019; 20 Aghazadeh (2020053012202488600_B73) 2018 Vogel (2020053012202488600_B128) 2009; 7 Brown (2020053012202488600_B108) 2012 Zhao (2020053012202488600_B83) 2018; 35 Ondov (2020053012202488600_B94) 2019; 20 Rozov (2020053012202488600_B62) 2014; 15 Shrivastava (2020053012202488600_B43) 2017; 70 Broder (2020053012202488600_B31) 1997 Pellow (2020053012202488600_B63) 2017; 24 Piro (2020053012202488600_B101) 2019 Pruitt (2020053012202488600_B134) 2005; 33 Solomon (2020053012202488600_B92) 2017 Flajolet (2020053012202488600_B9) 2007; 2 Buchfink (2020053012202488600_B96) 2015; 12 Studer (2020053012202488600_B14) 2012 Baraniuk (2020053012202488600_B71) 2008; 28 Salvatore (2020053012202488600_B82) 2019 Candes (2020053012202488600_B17) 2005; 51 Koslicki (2020053012202488600_B110) 2014; 9 Indyk (2020053012202488600_B32) 1998 Gupta (2020053012202488600_B68) 2019 Limasset (2020053012202488600_B53) 2017 Amid (2020053012202488600_B131) 2019; 48 Marçais (2020053012202488600_B57) 2017; 33 Solomon (2020053012202488600_B91) 2016; 34 Aghazadeh (2020053012202488600_B113) 2018; 80 Denver (2020053012202488600_B122) 2016; 12 Candes (2020053012202488600_B70) 2008; 346 Needell (2020053012202488600_B38) 2009; 26 Murray (2020053012202488600_B88) 2017; 13 Mohamadi (2020053012202488600_B51) 2016; 32 Quedenfeld (2020053012202488600_B81) 2017 Shrivastava (2020053012202488600_B45) 2015 Cormode (2020053012202488600_B8) 2005; 55 Bovee (2020053012202488600_B85) 2018; 3 Wedemeyer (2020053012202488600_B109) 2017; 18 Charalampous (2020053012202488600_B78) 2019; 37 Singh (2020053012202488600_B1) 2019; 188 Chu (2020053012202488600_B99) 2018 Ondov (2020053012202488600_B40) 2016; 17 Gaber (2020053012202488600_B35) 2005; 34 Rusch (2020053012202488600_B124) 2007; 5 Aghazadeh (2020053012202488600_B119) 2016; 2 Charikar (2020053012202488600_B47) 2002 Leinonen (2020053012202488600_B132) 2010; 39 Shakya (2020053012202488600_B133) 2013; 15 Lin (2020053012202488600_B27) 2003 Dasgupta (2020053012202488600_B29) 2000 Broder (2020053012202488600_B6) 1997 Bradley (2020053012202488600_B93) 2019; 37 Creer (2020053012202488600_B107) 2016; 7 Roberts (2020053012202488600_B13) 2004; 20 Luo (2020053012202488600_B106) 2018; 35 Yang (2020053012202488600_B87) 2017 Baraniuk (2020053012202488600_B16) 2007; 24 Ilie (2020053012202488600_B100) 2012; 5 Metzler (2020053012202488600_B76) 2017 Berger (2020053012202488600_B2) 2016; 59 Halko (2020053012202488600_B25) 2011; 53 Dadi (2020053012202488600_B102) 2018; 34 Herlihy (2020053012202488600_B54) 2008 Wu (2020053012202488600_B75) 2019 Bloom (2020053012202488600_B10) 1970; 13 Marchet (2020053012202488600_B52) 2018; 274 Alon (2020053012202488600_B33) 1999; 58 Crusoe (2020053012202488600_B89) 2015; 4 Qin (2020053012202488600_B126) 2010; 464 Bonomi (2020053012202488600_B64) 2006 Koslicki (2020053012202488600_B111) 2013; 29 Koslicki (2020053012202488600_B112) 2016; 1 Mousavi (2020053012202488600_B74) 2019 Lustig (2020053012202488600_B15) 2007; 58 Berger (2020053012202488600_B4) 2013; 14 Donoho (2020053012202488600_B18) 2006; 52 Dai (2020053012202488600_B118) 2008; 2009 Liu (2020053012202488600_B105) 2017; 34 Kopf (2020053012202488600_B125) 2015; 4 Maillard (2020053012202488600_B30) 2012; 13 Vempala (2020053012202488600_B20) 2004 Mousavi (2020053012202488600_B77) 2017 Cohen (2020053012202488600_B65) 2003 Chen (2020053012202488600_B129) 2018; 47 Breitwieser (2020053012202488600_B97) 2018; 19 Shrivastava (2020053012202488600_B46) 2016 Katz (2020053012202488600_B80) 2017 Salikhov (2020053012202488600_B61) 2013 Baker (2020053012202488600_B84) 2019; 20 Tropp (2020053012202488600_B36) 2007; 53 Heule (2020053012202488600_B59) 2013 Pagh (2020053012202488600_B56) 2004; 51 Chiu (2020053012202488600_B79) 2019; 20 Chvatal (2020053012202488600_B117) 1979; 4 Dasgupta (2020053012202488600_B28) 2008 Metsky (2020053012202488600_B116) 2019; 37 Chabchoub (2020053012202488600_B60) 2010 Vervier (2020053012202488600_B114) 2015; 32 Drineas (2020053012202488600_B24) 2016; 59 Metzler (2020053012202488600_B39) 2016; 62 Bahadir (2020053012202488600_B72) 2019 Celis (2020053012202488600_B55) 1985 Gupta (2020053012202488600_B67) 2019 Ferragina (2020053012202488600_B12) 2000 Li (2020053012202488600_B49) 2011 Jain (2020053012202488600_B104) 2018; 25 Turnbaugh (2020053012202488600_B130) 2007; 449 Achlioptas (2020053012202488600_B21) 2001 Wood (2020053012202488600_B95) 2014; 15 Shrivastava (2020053012202488600_B48) 2014 Rowe (2020053012202488600_B5) 2019; 20 Li (2020053012202488600_B11) 2009; 25 Aggarwal (2020053012202488600_B34) 2007 Wood (2020053012202488600_B98) 2019; 20 Orenstein (2020053012202488600_B58) 2017; 13 Shrivastava (2020053012202488600_B44) 2014; 32 Rowe (2020053012202488600_B86) 2019; 7 Marçais (2020053012202488600_B3) 2019; 2 Indyk (2020053012202488600_B7) 1998 Ni (2020053012202488600_B115) 2013; 3 Howe (2020053012202488600_B123) 2014; 111 Pierce (2020053012202488600_B42) 2019; 8 Motwani (2020053012202488600_B19) 1995 Vempala (2020053012202488600_B22) 1997 Arriaga (2020053012202488600_B26) 2006; 63 Davenport (2020053012202488600_B121) 2012; 60 Roux (2020053012202488600_B127) 2016; 537 Coleman (2020053012202488600_B66) 2019 Domingo (2020053012202488600_B23) 2002; 6 Hassanian-esfahani (2020053012202488600_B50) 2018; 99 Dilthey (2020053012202488600_B103) 2019; 10 Cleary (2020053012202488600_B69) 2017; 171 Peters (2020053012202488600_B120) 2012; 25 |
| References_xml | – volume: 29 start-page: 2096 year: 2013 ident: 2020053012202488600_B111 article-title: Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing publication-title: Bioinformatics. doi: 10.1093/bioinformatics/btt336 – volume: 35 start-page: i127 year: 2019 ident: 2020053012202488600_B90 article-title: Locality-sensitive hashing for the edit distance publication-title: Bioinformatics. doi: 10.1093/bioinformatics/btz354 – start-page: 537 volume-title: Proceedings of the 40th Annual ACM Symposium on Theory of Computing year: 2008 ident: 2020053012202488600_B28 article-title: Random projection trees and low dimensional manifolds – volume: 24 start-page: 547 year: 2017 ident: 2020053012202488600_B63 article-title: Improving Bloom filter performance on sequence data using k-mer Bloom filters publication-title: J. Comput. Biol. doi: 10.1089/cmb.2016.0155 – volume: 37 start-page: 152 year: 2019 ident: 2020053012202488600_B93 article-title: Ultrafast search of all deposited bacterial and viral genomic data publication-title: Nat. Biotechnol. doi: 10.1038/s41587-018-0010-1 – volume: 5 start-page: 123 year: 2012 ident: 2020053012202488600_B100 article-title: Efficient computation of spaced seeds publication-title: BMC. Res. Notes. doi: 10.1186/1756-0500-5-123 – volume: 55 start-page: 58 year: 2005 ident: 2020053012202488600_B8 article-title: An improved data stream summary: the count-min sketch and its applications (Vol. 31) publication-title: J. Algorithm. doi: 10.1016/j.jalgor.2003.12.001 – year: 2019 ident: 2020053012202488600_B66 article-title: Diversified RACE sampling on data streams applied to metagenomic sequence analysis doi: 10.1101/852889 – volume: 34 start-page: 300 year: 2016 ident: 2020053012202488600_B91 article-title: Fast search of thousands of short-read sequencing experiments publication-title: Nat. Biotechnol. doi: 10.1038/nbt.3442 – volume: 13 start-page: e1005727 year: 2017 ident: 2020053012202488600_B88 article-title: kWIP: the k-mer weighted inner product, a de novo estimator of genetic similarity publication-title: PLoS. Comput. Biol. doi: 10.1371/journal.pcbi.1005727 – volume: 7 start-page: 40 year: 2019 ident: 2020053012202488600_B86 article-title: Streaming histogram sketching for rapid microbiome analytics publication-title: Microbiome. doi: 10.1186/s40168-019-0653-2 – volume: 32 start-page: 557 year: 2014 ident: 2020053012202488600_B44 article-title: Densifying one permutation hashing via rotation for fast near neighbor search publication-title: Proceedings of the 31st International Conference on Machine Learning – volume: 5 start-page: e77 year: 2007 ident: 2020053012202488600_B124 article-title: Oceanic metagenomics: the Sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific publication-title: PLoS. Biol. doi: 10.1371/journal.pbio.0050077 – volume: 15 start-page: 1882 year: 2013 ident: 2020053012202488600_B133 article-title: Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities publication-title: Environ. Microbiol. doi: 10.1111/1462-2920.12086 – start-page: 744 volume-title: 2017 55th Annual Allerton Conference on Communication, Control, and Computing year: 2017 ident: 2020053012202488600_B77 article-title: DeepCodec: adaptive sensing and recovery via deep convolutional neural networks – start-page: 812 volume-title: Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence year: 2015 ident: 2020053012202488600_B45 article-title: Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS) – volume: 48 start-page: D70 year: 2019 ident: 2020053012202488600_B131 article-title: The European Nucleotide Archive in 2019 publication-title: Nucleic Acids Res. – volume: 19 start-page: 198 year: 2018 ident: 2020053012202488600_B97 article-title: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts publication-title: Genome. Biol. doi: 10.1186/s13059-018-1568-0 – year: 2018 ident: 2020053012202488600_B99 article-title: Improving on hash-based probabilistic sequence classification using multiple spaced seeds and multi-index Bloom filters doi: 10.1101/434795 – volume: 99 start-page: 203 year: 2018 ident: 2020053012202488600_B50 article-title: Sectional minhash for near-duplicate detection publication-title: Expert. Syst. Appl. doi: 10.1016/j.eswa.2018.01.014 – volume: 24 start-page: 118 year: 2007 ident: 2020053012202488600_B16 article-title: Compressive sensing publication-title: IEEE. Signal. Proc. Mag. doi: 10.1109/MSP.2007.4286571 – volume: 7 start-page: 252 year: 2009 ident: 2020053012202488600_B128 article-title: TerraGenome: a consortium for the sequencing of a soil metagenome publication-title: Nat. Rev. Microbiol. doi: 10.1038/nrmicro2119 – volume: 34 start-page: 18 year: 2005 ident: 2020053012202488600_B35 article-title: Mining data streams: a review publication-title: Sigmod. Rec. doi: 10.1145/1083784.1083789 – volume: 58 start-page: 1182 year: 2007 ident: 2020053012202488600_B15 article-title: Sparse MRI: the application of compressed sensing for rapid MR imaging publication-title: Magn. Reson. Med. doi: 10.1002/mrm.21391 – volume: 171 start-page: 1424 year: 2017 ident: 2020053012202488600_B69 article-title: Efficient generation of transcriptomic profiles by random composite measurements publication-title: Cell. doi: 10.1016/j.cell.2017.10.023 – volume: 20 start-page: 3363 year: 2004 ident: 2020053012202488600_B13 article-title: Reducing storage requirements for biological sequence comparison publication-title: Bioinformatics. doi: 10.1093/bioinformatics/bth408 – start-page: 683 volume-title: Proceedings of the 16th International Conference on Extending Database Technology year: 2013 ident: 2020053012202488600_B59 article-title: HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm doi: 10.1145/2452376.2452456 – volume: 346 start-page: 589 year: 2008 ident: 2020053012202488600_B70 article-title: The restricted isometry property and its implications for compressed sensing publication-title: C. R. Math. doi: 10.1016/j.crma.2008.03.014 – volume: 63 start-page: 161 year: 2006 ident: 2020053012202488600_B26 article-title: An algorithmic theory of learning: robust concepts and random projection publication-title: Mach. Learn. doi: 10.1007/s10994-006-6265-7 – volume: 62 start-page: 5117 year: 2016 ident: 2020053012202488600_B39 article-title: From denoising to compressed sensing publication-title: IEEE Trans. Inform. Theory. doi: 10.1109/TIT.2016.2556683 – volume: 15 start-page: R46 year: 2014 ident: 2020053012202488600_B95 article-title: Kraken: ultrafast metagenomic sequence classification using exact alignments publication-title: Genome. Biol. doi: 10.1186/gb-2014-15-3-r46 – volume: 9 start-page: e91784 year: 2014 ident: 2020053012202488600_B110 article-title: WGSQuikr: fast whole-genome shotgun metagenomic classification publication-title: PLoS. One. doi: 10.1371/journal.pone.0091784 – volume: 25 start-page: 1754 year: 2009 ident: 2020053012202488600_B11 article-title: Fast and accurate short read alignment with Burrows–Wheeler transform publication-title: Bioinformatics. doi: 10.1093/bioinformatics/btp324 – volume: 33 start-page: i110 year: 2017 ident: 2020053012202488600_B57 article-title: Improving the performance of minimizers and winnowing schemes publication-title: Bioinformatics. doi: 10.1093/bioinformatics/btx235 – volume: 12 start-page: 59 year: 2015 ident: 2020053012202488600_B96 article-title: Fast and sensitive protein alignment using DIAMOND publication-title: Nat. Methods. doi: 10.1038/nmeth.3176 – volume: 111 start-page: 4904 year: 2014 ident: 2020053012202488600_B123 article-title: Tackling soil diversity with the assembly of large, complex metagenomes publication-title: Proc. Natl. Acad. Sci. U.S.A. doi: 10.1073/pnas.1402564111 – volume: 18 start-page: 324 year: 2017 ident: 2020053012202488600_B109 article-title: An improved filtering algorithm for big read datasets and its application to single-cell assembly publication-title: BMC. Bioinformatics. doi: 10.1186/s12859-017-1724-7 – start-page: 350 volume-title: International Symposium on Distributed Computing year: 2008 ident: 2020053012202488600_B54 article-title: Hopscotch hashing doi: 10.1007/978-3-540-87779-0_24 – start-page: 364 volume-title: International Workshop on Algorithms in Bioinformatics year: 2013 ident: 2020053012202488600_B61 article-title: Using cascading Bloom filters to improve the memory usage for de Brujin graphs doi: 10.1007/978-3-642-40453-5_28 – volume: 8 start-page: 1006 year: 2019 ident: 2020053012202488600_B42 article-title: Large-scale sequence comparisons with sourmash [version 1; peer review: 2 approved] publication-title: F1000Research doi: 10.12688/f1000research.19675.1 – year: 2012 ident: 2020053012202488600_B108 article-title: A reference-free algorithm for computational normalization of shotgun sequencing data – start-page: 241 volume-title: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data year: 2003 ident: 2020053012202488600_B65 article-title: Spectral Bloom filters doi: 10.1145/872757.872787 – volume: 10 start-page: 3066 year: 2019 ident: 2020053012202488600_B103 article-title: Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps publication-title: Nat. Commun. doi: 10.1038/s41467-019-10934-2 – volume: 53 start-page: 217 year: 2011 ident: 2020053012202488600_B25 article-title: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions publication-title: SIAM. Rev. doi: 10.1137/090771806 – year: 2019 ident: 2020053012202488600_B82 article-title: Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis publication-title: Brief. Bioinform. doi: 10.1093/bib/bbz083 – volume: 2009 start-page: 162824 year: 2008 ident: 2020053012202488600_B118 article-title: Compressive sensing DNA microarrays publication-title: EURASIP J. Bioinform. Syst. Biol. – start-page: 684 volume-title: European Symposium on Algorithms year: 2006 ident: 2020053012202488600_B64 article-title: An improved construction for counting Bloom filters – volume: 20 start-page: 265 year: 2019 ident: 2020053012202488600_B84 article-title: Dashing: fast and accurate genomic distances with HyperLogLog publication-title: Genome. Biol. doi: 10.1186/s13059-019-1875-0 – volume: 35 start-page: 671 year: 2018 ident: 2020053012202488600_B83 article-title: BinDash, software for fast genome distance estimation on a typical personal laptop publication-title: Bioinformatics. doi: 10.1093/bioinformatics/bty651 – volume: 26 start-page: 301 year: 2009 ident: 2020053012202488600_B38 article-title: CoSaMP: iterative signal recovery from incomplete and inaccurate samples publication-title: Appl. Comput. Harmon. A. doi: 10.1016/j.acha.2008.07.002 – start-page: 545 volume-title: 2017 IEEE International Conference on Data Mining (ICDM) year: 2017 ident: 2020053012202488600_B87 article-title: Histosketch: fast similarity-preserving sketching of streaming histograms with concept drift doi: 10.1109/ICDM.2017.64 – start-page: E1679 volume-title: Proceedings of the National Academy of Sciences year: 2012 ident: 2020053012202488600_B14 article-title: Compressive fluorescence microscopy for biological and hyperspectral imaging – volume-title: Randomized Algorithms year: 1995 ident: 2020053012202488600_B19 doi: 10.1017/CBO9780511814075 – start-page: 21 volume-title: Proceedings of the Compression and Complexity of Sequences year: 1997 ident: 2020053012202488600_B31 article-title: On the resemblance and containment of documents – volume: 274 start-page: 92 year: 2018 ident: 2020053012202488600_B52 article-title: A resource-frugal probabilistic dictionary and applications in bioinformatics publication-title: Discrete. Appl. Math. doi: 10.1016/j.dam.2018.03.035 – volume: 537 start-page: 689 year: 2016 ident: 2020053012202488600_B127 article-title: Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses publication-title: Nature doi: 10.1038/nature19366 – volume: 34 start-page: i766 year: 2018 ident: 2020053012202488600_B102 article-title: DREAM-Yara: an exact read mapper for very large databases with short update time publication-title: Bioinformatics. doi: 10.1093/bioinformatics/bty567 – volume: 2 start-page: 137 year: 2007 ident: 2020053012202488600_B9 article-title: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm publication-title: Discrete. Math. Theor. – volume: 58 start-page: 137 year: 1999 ident: 2020053012202488600_B33 article-title: The space complexity of approximating the frequency moments publication-title: J. Comput. Syst. Sci. doi: 10.1006/jcss.1997.1545 – volume: 14 start-page: 629 year: 2008 ident: 2020053012202488600_B37 article-title: Iterative thresholding for sparse approximations publication-title: J. Fourier. Anal. Appl. doi: 10.1007/s00041-008-9035-z – volume: 37 start-page: 783 year: 2019 ident: 2020053012202488600_B78 article-title: Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection publication-title: Nat. Biotechnol. doi: 10.1038/s41587-019-0156-5 – year: 2019 ident: 2020053012202488600_B101 article-title: Ganon: precise metagenomics classification against large and up-to-date sets of reference sequences – volume: 34 start-page: 171 year: 2017 ident: 2020053012202488600_B105 article-title: A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures publication-title: Bioinformatics. doi: 10.1093/bioinformatics/btx432 – volume: 51 start-page: 4203 year: 2005 ident: 2020053012202488600_B17 article-title: Decoding by linear programming publication-title: IEEE. T. Inform. Theory doi: 10.1109/TIT.2005.858979 – start-page: 1498 volume-title: Proceedings of the 30th International Conference on Neural Information Processing Systems year: 2016 ident: 2020053012202488600_B46 article-title: Simple and efficient weighted minwise hashing – start-page: 335 volume-title: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic year: 2001 ident: 2020053012202488600_B21 article-title: Sampling techniques for kernel methods – start-page: 8 volume-title: Poster presented at: American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines year: 2017 ident: 2020053012202488600_B80 article-title: Generating WGS trees with Mashtree – start-page: 1297 volume-title: 2010 IEEE International Conference on Data Mining Workshops year: 2010 ident: 2020053012202488600_B60 article-title: Sliding hyperloglog: estimating cardinality in a data stream over a sliding window doi: 10.1109/ICDMW.2010.18 – volume: 15 start-page: S7 year: 2014 ident: 2020053012202488600_B62 article-title: Fast lossless compression via cascading Bloom filters publication-title: BMC. Bioinformatics. doi: 10.1186/1471-2105-15-S9-S7 – volume: 6 start-page: 131 year: 2002 ident: 2020053012202488600_B23 article-title: Adaptive sampling methods for scaling up knowledge discovery algorithms publication-title: Data. Min. Knowl. Disc. doi: 10.1023/A:1014091514039 – start-page: 1770 volume-title: Proceedings of the 31st International Conference on Neural Information Processing Systems year: 2017 ident: 2020053012202488600_B76 article-title: Learned D-AMP: principled neural network based compressive image recovery – volume: 60 start-page: 4628 year: 2012 ident: 2020053012202488600_B121 article-title: The pros and cons of compressive sensing for wideband signal acquisition: noise folding versus dynamic range publication-title: IEEE Trans. Signal. Proces. doi: 10.1109/TSP.2012.2201149 – volume: 53 start-page: 4655 year: 2007 ident: 2020053012202488600_B36 article-title: Signal recovery from random measurements via orthogonal matching pursuit publication-title: IEEE Trans. Inform. Theory. doi: 10.1109/TIT.2007.909108 – start-page: 4689 volume-title: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) year: 2018 ident: 2020053012202488600_B73 article-title: Insense: incoherent sensor selection for sparse signals doi: 10.1109/ICASSP.2018.8461701 – volume: 52 start-page: 1289 year: 2006 ident: 2020053012202488600_B18 article-title: Compressed sensing publication-title: IEEE Trans. Inform. Theory. doi: 10.1109/TIT.2006.871582 – volume: 2 start-page: 93 year: 2019 ident: 2020053012202488600_B3 article-title: Sketching and sublinear data structures in genomics publication-title: Annu. Rev. Biomed. Data Sci. doi: 10.1146/annurev-biodatasci-072018-021156 – volume: 17 start-page: 132 year: 2016 ident: 2020053012202488600_B40 article-title: Mash: fast genome and metagenome distance estimation using MinHash publication-title: Genome. Biol. doi: 10.1186/s13059-016-0997-x – volume: 70 start-page: 3154 year: 2017 ident: 2020053012202488600_B43 article-title: Optimal densification for fast and accurate minwise hashing publication-title: Proceedings of the 34th International Conference on Machine Learning – year: 2019 ident: 2020053012202488600_B68 article-title: Sub-linear sequence search via a Repeated And Merged Bloom Filter (RAMBO): indexing 170 TB data in 14 hours – volume: 35 start-page: 219 year: 2018 ident: 2020053012202488600_B106 article-title: Metagenomic binning through low-density hashing publication-title: Bioinformatics. doi: 10.1093/bioinformatics/bty611 – volume: 14 start-page: 333 year: 2013 ident: 2020053012202488600_B4 article-title: Computational solutions for omics data publication-title: Nat. Rev. Genet. doi: 10.1038/nrg3433 – volume: 25 start-page: 766 year: 2018 ident: 2020053012202488600_B104 article-title: A fast approximate algorithm for mapping long reads to large reference databases publication-title: J. Comput. Biol. doi: 10.1089/cmb.2018.0036 – volume: 59 start-page: 80 year: 2016 ident: 2020053012202488600_B24 article-title: RandNLA: randomized numerical linear algebra publication-title: Commun. Acm. doi: 10.1145/2842602 – volume: 32 start-page: 3492 year: 2016 ident: 2020053012202488600_B51 article-title: ntHash: recursive nucleotide hashing publication-title: Bioinformatics. doi: 10.1093/bioinformatics/btw397 – volume: 37 start-page: 160 year: 2019 ident: 2020053012202488600_B116 article-title: Capturing sequence diversity in metagenomes with comprehensive and scalable probe design publication-title: Nat. Biotechnol. doi: 10.1038/s41587-018-0006-x – start-page: 390 volume-title: Proceedings 41st Annual Symposium on Foundations of Computer Science year: 2000 ident: 2020053012202488600_B12 article-title: Opportunistic data structures with applications doi: 10.1109/SFCS.2000.892127 – year: 2019 ident: 2020053012202488600_B72 article-title: Adaptive compressed sensing MRI with unsupervised learning – volume: 25 start-page: 193 year: 2012 ident: 2020053012202488600_B120 article-title: Polymicrobial Interactions: impact on Pathogenesis and Human Disease publication-title: Clin. Microbiol. Rev. doi: 10.1128/CMR.00013-11 – volume: 13 start-page: e1005777 year: 2017 ident: 2020053012202488600_B58 article-title: Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing publication-title: PLoS. Comput. Biol. doi: 10.1371/journal.pcbi.1005777 – volume: 188 start-page: 104987 year: 2019 ident: 2020053012202488600_B1 article-title: Probabilistic data structures for big data analytics: A comprehensive review publication-title: Knowl.-Based. Syst. doi: 10.1016/j.knosys.2019.104987 – volume: 2 start-page: e1600025 year: 2016 ident: 2020053012202488600_B119 article-title: Universal microbial diagnostics using random DNA probes publication-title: Sci. Adv. doi: 10.1126/sciadv.1600025 – volume: 13 start-page: 2735 year: 2012 ident: 2020053012202488600_B30 article-title: Linear regression with random projections publication-title: J. Mach. Learn. Res. – volume: 20 start-page: 389 year: 2019 ident: 2020053012202488600_B41 article-title: Viral coinfection analysis using a MinHash toolkit publication-title: BMC. Bioinformatics. doi: 10.1186/s12859-019-2918-y – start-page: 380 volume-title: Proceedings of the 34th Annual ACM Symposium on Theory of Computing year: 2002 ident: 2020053012202488600_B47 article-title: Similarity estimation techniques from rounding algorithms – start-page: 281 volume-title: Proceedings 26th Annual Symposium on Foundations of Computer Science (sfcs 1985) year: 1985 ident: 2020053012202488600_B55 article-title: Robin hood hashing doi: 10.1109/SFCS.1985.48 – volume: 39 start-page: D19 year: 2010 ident: 2020053012202488600_B132 article-title: The sequence read archive publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkq1019 – start-page: 604 volume-title: Proceedings of the 30th Annual ACM Symposium on Theory of Computing year: 1998 ident: 2020053012202488600_B32 article-title: Approximate nearest neighbors: towards removing the curse of dimensionality – start-page: 143 volume-title: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence year: 2000 ident: 2020053012202488600_B29 article-title: Experiments with random projection – volume: 20 start-page: 232 year: 2019 ident: 2020053012202488600_B94 article-title: Mash Screen: high-throughput sequence containment estimation for genome discovery publication-title: Genome. Biol. doi: 10.1186/s13059-019-1841-x – volume: 59 start-page: 72 year: 2016 ident: 2020053012202488600_B2 article-title: Computational biology in the 21st century: Scaling with compressive algorithms publication-title: Commun. Acm. doi: 10.1145/2957324 – year: 2017 ident: 2020053012202488600_B81 article-title: Variant tolerant read mapping using min-hashing – volume-title: The Random Projection Method (Vol. 65) year: 2004 ident: 2020053012202488600_B20 – volume: 80 start-page: 80 year: 2018 ident: 2020053012202488600_B113 article-title: MISSION: ultra large-scale feature selection using count-sketches publication-title: Proceedings of the 35th International Conference on Machine Learning – volume: 464 start-page: 59 year: 2010 ident: 2020053012202488600_B126 article-title: A human gut microbial gene catalogue established by metagenomic sequencing publication-title: Nature doi: 10.1038/nature08821 – volume: 3 start-page: 1968 year: 2013 ident: 2020053012202488600_B115 article-title: How much metagenomic sequencing is enough to achieve a given goal? publication-title: Sci. Rep.-UK. doi: 10.1038/srep01968 – year: 2019 ident: 2020053012202488600_B67 article-title: RAMBO: Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in sub-linear time – year: 2019 ident: 2020053012202488600_B75 article-title: The sparse recovery autoencoder – volume: 20 start-page: 257 year: 2019 ident: 2020053012202488600_B98 article-title: Improved metagenomic analysis with Kraken 2 publication-title: Genome. Biol. doi: 10.1186/s13059-019-1891-0 – volume-title: Proceedings of the Text Mining Workshop, at the 3rd SIAM International Conference on Data Mining year: 2003 ident: 2020053012202488600_B27 article-title: Dimensionality reduction by random projection and latent semantic indexing – volume: 3 start-page: 505 year: 2018 ident: 2020053012202488600_B85 article-title: Finch: a tool adding dynamic abundance filtering to genomic MinHashing publication-title: J. Open Source Softw. doi: 10.21105/joss.00505 – volume: 12 start-page: e1005713 year: 2016 ident: 2020053012202488600_B122 article-title: Genome skimming: a rapid approach to gaining diverse biological insights into multicellular pathogens publication-title: PLoS. Pathog. doi: 10.1371/journal.ppat.1005713 – volume: 47 start-page: D666 year: 2018 ident: 2020053012202488600_B129 article-title: IMG/M v. 5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes publication-title: Nucleic Acids Res. doi: 10.1093/nar/gky901 – volume: 20 start-page: 199 year: 2019 ident: 2020053012202488600_B5 article-title: When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data publication-title: Genome. Biol. doi: 10.1186/s13059-019-1809-x – volume: 7 start-page: 1008 year: 2016 ident: 2020053012202488600_B107 article-title: The ecologist’s field guide to sequence-based identification of biodiversity publication-title: Methods. Ecol. Evol. doi: 10.1111/2041-210X.12574 – start-page: 2672 volume-title: Proceedings of the 24th International Conference on Neural Information Processing Systems year: 2011 ident: 2020053012202488600_B49 article-title: Hashing algorithms for large-scale learning – volume: 20 start-page: 341 year: 2019 ident: 2020053012202488600_B79 article-title: Clinical metagenomics publication-title: Nat. Rev. Genet. doi: 10.1038/s41576-019-0113-7 – start-page: 604 volume-title: Proceedings of the 30th Annual ACM Symposium on Theory of Computing year: 1998 ident: 2020053012202488600_B7 article-title: Approximate nearest neighbors: towards removing the curse of dimensionality – volume: 13 start-page: 422 year: 1970 ident: 2020053012202488600_B10 article-title: Space/time trade-offs in hash coding with allowable errors publication-title: Commun. Acm. doi: 10.1145/362686.362692 – volume-title: Data Streams: Models and Algorithms (Vol. 31) year: 2007 ident: 2020053012202488600_B34 doi: 10.1007/978-0-387-47534-9 – volume: 28 start-page: 253 year: 2008 ident: 2020053012202488600_B71 article-title: A simple proof of the restricted isometry property for random matrices publication-title: Constr. Approx. doi: 10.1007/s00365-007-9003-x – volume: 1 start-page: e00020-16 year: 2016 ident: 2020053012202488600_B112 article-title: MetaPalette: a k-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation publication-title: MSystems doi: 10.1128/mSystems.00020-16 – start-page: 1 volume-title: 16th International Symposium on Experimental Algorithms year: 2017 ident: 2020053012202488600_B53 article-title: Fast and scalable minimal perfect hashing for massive key sets – start-page: 21 volume-title: Proceedings of the Compression and Complexity of Sequences year: 1997 ident: 2020053012202488600_B6 article-title: On the resemblance and containment of documents – volume: 51 start-page: 122 year: 2004 ident: 2020053012202488600_B56 article-title: Cuckoo hashing publication-title: J. Algorithm. doi: 10.1016/j.jalgor.2003.12.002 – start-page: 508 volume-title: Proceedings 38th Annual Symposium on Foundations of Computer Science year: 1997 ident: 2020053012202488600_B22 article-title: A random sampling based algorithm for learning the intersection of half-spaces doi: 10.1109/SFCS.1997.646139 – start-page: 886 volume-title: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics year: 2014 ident: 2020053012202488600_B48 article-title: In defense of minhash over simhash – volume-title: 7th International Conference on Learning Representations year: 2019 ident: 2020053012202488600_B74 article-title: A data-driven and distributed approach to sparse signal representation and recovery – volume: 4 start-page: 233 year: 1979 ident: 2020053012202488600_B117 article-title: A greedy heuristic for the set-covering problem publication-title: Math. Oper. Res. doi: 10.1287/moor.4.3.233 – volume: 449 start-page: 804 year: 2007 ident: 2020053012202488600_B130 article-title: The human microbiome project publication-title: Nature doi: 10.1038/nature06244 – volume: 32 start-page: 1023 year: 2015 ident: 2020053012202488600_B114 article-title: Large-scale machine learning for metagenomics sequence classification publication-title: Bioinformatics. doi: 10.1093/bioinformatics/btv683 – volume: 4 start-page: 900 year: 2015 ident: 2020053012202488600_B89 article-title: The khmer software package: enabling efficient nucleotide sequence analysis [version 1; peer review: 2 approved, 1 approved with reservations] publication-title: F1000Research doi: 10.12688/f1000research.6924.1 – volume: 4 start-page: 27 year: 2015 ident: 2020053012202488600_B125 article-title: The ocean sampling day consortium publication-title: Gigascience doi: 10.1186/s13742-015-0066-5 – volume: 33 start-page: D501 year: 2005 ident: 2020053012202488600_B134 article-title: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins publication-title: Nucleic Acids Res. doi: 10.1093/nar/gki025 – start-page: 257 volume-title: International Conference on Research in Computational Molecular Biology year: 2017 ident: 2020053012202488600_B92 article-title: Improved search of large transcriptomic sequencing databases using split sequence bloom trees doi: 10.1007/978-3-319-56970-3_16 |
| SSID | ssj0014154 |
| Score | 2.5071652 |
| SecondaryResourceType | review_article |
| Snippet | As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with... |
| SourceID | pubmedcentral proquest pubmed crossref |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | 5217 |
| SubjectTerms | Algorithms Humans Metagenome - genetics Metagenomics - methods Probability Signal Processing, Computer-Assisted Survey and Summary |
| Title | To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/32338745 https://www.proquest.com/docview/2395254854 https://pubmed.ncbi.nlm.nih.gov/PMC7261164 |
| Volume | 48 |
| WOSCitedRecordID | wos000569071800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ: Directory of Open Access Journal (DOAJ) customDbUrl: eissn: 1362-4962 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014154 issn: 0305-1048 databaseCode: DOA dateStart: 20050101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1362-4962 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014154 issn: 0305-1048 databaseCode: TOX dateStart: 19960101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Pb9MwFLa6gWAXBBuD8mMy0sQBFC1xnLjmNk1MHNAYqEi9RY7tboE2qdJs2m78E_y_PNtxmo4hwYFLVDWO3eb74jy_971nhPZJKDlhOgkUVyqgqRZBHuU0IJIyniTTUDG32QQ7ORlNJvx0MPjpc2EuZ6wsR1dXfPFfoYbvAGyTOvsPcHedwhfwGUCHI8AOx78DvjKiW5FfG4-qcYvnNknFLP1hdrOSchf3dwryGh5oI5Bd-sqtRtFh0rNcBoHNYZydVXXRnM-XXm9Z1G97kW9jv85hSFPvde7V8998Xi_8PNOzLJSJUPScZ1Yx0kkUv8AU9VFXKwe_m4Q-F90roXKG7qkwKVedd9YETHTt3Lg-yNV6MUho1VbOk6DdzGvTt_j61ExHfQqGvYkWrA526xvAVccqjTr9-Oy7EMRtRdEDfjG3yMcEVufM1bK8UXLbn9pAdwhLuFEKjj9NuuAU2Dy0TfWE0Q5grIN2pC10z1-7buf8tni5qcHtGTXjh-hBuxrBh45Fj9BAl9to57AUTTW_xq-x1QfbwMs2un_k9wbcQT_GFe5IhoEU2JHsHXYUw55iuCjxGsVsY0cxvKIYXlHMNrAUwz2K4abCfYo9Rl-P34-PPgTtXh6BpCFvgoSB5ZmqOFS5jk0VQhJzY-6KWDOpNOdiyihXOQ0l41GuONU8lZJKMuJChdN4F22WVamfIqxCSaJQExWlGi7RObyComkswBAWeS70EL3xdz6TbaF7s9_KLHOCizgDxLIWsSHa7xovXH2X25u98hBmcKtNUE2UurpYZvA_EgLL_oQO0RMHadeR58IQsTWwuwamtvv6mbI4tzXeGUmjKKXP_tjnc7S1eo5eoM2mvtAv0V152RTLeg9tsMlozzqY9ix5fwEbDcoA |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=To+Petabytes+and+beyond%3A+recent+advances+in+probabilistic+and+signal+processing+algorithms+and+their+application+to+metagenomics&rft.jtitle=Nucleic+acids+research&rft.au=Elworth%2C+R+A+Leo&rft.au=Wang%2C+Qi&rft.au=Kota%2C+Pavan+K&rft.au=Barberan%2C+C+J&rft.date=2020-06-04&rft.eissn=1362-4962&rft.volume=48&rft.issue=10&rft.spage=5217&rft_id=info:doi/10.1093%2Fnar%2Fgkaa265&rft_id=info%3Apmid%2F32338745&rft.externalDocID=32338745 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0305-1048&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0305-1048&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0305-1048&client=summon |