Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm
Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequ...
Uloženo v:
| Vydáno v: | Bioinformatics Ročník 36; číslo 12; s. 3669 - 3679 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
England
Oxford University Press
01.06.2020
|
| Témata: | |
| ISSN: | 1367-4803, 1367-4811, 1460-2059, 1367-4811 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Abstract
Motivation
Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.
Results
We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.
Availability and implementation
Source code is available at https://github.com/CMU-SAFARI/Apollo.
Supplementary information
Supplementary data are available at Bioinformatics online. |
|---|---|
| AbstractList | Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.MOTIVATIONThird-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.RESULTSWe introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.Source code is available at https://github.com/CMU-SAFARI/Apollo.AVAILABILITY AND IMPLEMENTATIONSource code is available at https://github.com/CMU-SAFARI/Apollo.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online. Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary data are available at Bioinformatics online. |
| Author | Senol Cali, Damla Alkan, Can Firtina, Can Cicek, A Ercument Mutlu, Onur Kim, Jeremie S Alser, Mohammed |
| Author_xml | – sequence: 1 givenname: Can orcidid: 0000-0002-6548-7863 surname: Firtina fullname: Firtina, Can email: can.firtina@inf.ethz.ch organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland – sequence: 2 givenname: Jeremie S surname: Kim fullname: Kim, Jeremie S organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland – sequence: 3 givenname: Mohammed surname: Alser fullname: Alser, Mohammed organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland – sequence: 4 givenname: Damla surname: Senol Cali fullname: Senol Cali, Damla organization: Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA – sequence: 5 givenname: A Ercument orcidid: 0000-0001-8613-6619 surname: Cicek fullname: Cicek, A Ercument organization: Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey – sequence: 6 givenname: Can surname: Alkan fullname: Alkan, Can email: can.firtina@inf.ethz.ch organization: Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey – sequence: 7 givenname: Onur surname: Mutlu fullname: Mutlu, Onur email: can.firtina@inf.ethz.ch organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/32167530$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkEtv1DAUhS3Uij7gL1RZsiDUjySeQWyqqgWkSt3AOrq-vpkxcuxgO4v59xjNgAQb2Pghfefce84VOwsxEGM3gr8TfKtujYsuTDHNUBzmW1MAhN6-YJeiG3greb89q2816LbbcHXBrnL-xnkvuq57yS6UFIPuFb9k490SvY_vG2gyfV8poAu7thDuQ_Rxd2hdsLRQPUJ522QED8ZTA8E2gLgmKPWTM83GH5pq5fK-GjTgdzG5sp9fsfMJfKbXp_uafX18-HL_qX16_vj5_u6pxY6LUhceLJIiCxtL00C6J9xK7DSqXgHH3oLhkxRgcCM71MJKEjWAnGhjjO3VNXtz9F1SrDFyGWeXkbyHQHHNo1RaK6X7QVb05oSuZiY7LsnNkA7jr1IqMBwBTDHnRNNvRPDxZ_vjn-2Pp_ar8MNfQnSlIjGUBM7_Wy6O8rgu_zvyB57Cp9Y |
| CitedBy_id | crossref_primary_10_1093_nar_gkaa889 crossref_primary_10_3390_ijms252111603 crossref_primary_10_1016_j_syapm_2025_126643 crossref_primary_10_1007_s11033_022_07135_4 crossref_primary_10_1186_s12859_025_06091_7 crossref_primary_10_7717_peerj_18132 crossref_primary_10_1007_s13258_023_01458_7 crossref_primary_10_1093_bib_bbad264 crossref_primary_10_1093_nargab_lqab034 crossref_primary_10_1145_3632950 crossref_primary_10_1093_nargab_lqad004 crossref_primary_10_1038_s41598_024_58934_7 crossref_primary_10_1016_j_csbj_2022_08_019 crossref_primary_10_1093_bib_bbab405 crossref_primary_10_1186_s13059_020_02235_5 crossref_primary_10_3390_horticulturae9030302 crossref_primary_10_3390_ijms232012080 crossref_primary_10_1038_s42003_023_05619_y crossref_primary_10_1007_s00425_022_03987_z crossref_primary_10_1038_s41598_021_00178_w crossref_primary_10_1186_s13059_024_03181_2 crossref_primary_10_1016_j_ygeno_2024_110842 crossref_primary_10_1016_j_envres_2025_122591 crossref_primary_10_1186_s12864_022_08577_7 |
| Cites_doi | 10.1371/journal.pcbi.1002195 10.1038/s41587-019-0217-9 10.1093/bioinformatics/btu538 10.1038/nbt.3238 10.1016/j.gpb.2015.08.002 10.1038/nbt.2280 10.1109/TIT.1967.1054010 10.1038/nmeth.3444 10.1093/bib/bby017 10.1093/bioinformatics/btx342 10.1371/journal.pone.0046679 10.1093/bioinformatics/btp324 10.1073/pnas.74.12.5463 10.1093/bioinformatics/btw321 10.1186/s12864-018-4460-0 10.1093/bioinformatics/btp352 10.12688/f1000research.10571.2 10.1093/bioinformatics/bth205 10.1109/JPROC.2016.2645402 10.1038/nmeth.2474 10.1093/bioinformatics/btw152 10.1073/pnas.85.8.2444 10.1038/nbt.4060 10.1371/journal.pone.0112963 10.1101/gr.215087.116 10.1101/gr.168450.113 10.1093/bioinformatics/bty191 10.1038/nrg3933 10.21105/joss.00027 10.1186/gb-2004-5-2-r12 10.1186/1471-2105-9-11 10.1101/gr.214270.116 10.1093/bioinformatics/btw139 10.1038/nmeth.1527 10.1111/j.1755-0998.2011.03024.x 10.1186/1471-2105-13-238 10.1093/bioinformatics/btt086 10.1093/bioinformatics/btz234 10.1515/popets-2017-0042 10.1186/1471-2164-14-S1-S13 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. |
| Copyright_xml | – notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020 – notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1093/bioinformatics/btaa179 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1460-2059 1367-4811 |
| EndPage | 3679 |
| ExternalDocumentID | 32167530 10_1093_bioinformatics_btaa179 10.1093/bioinformatics/btaa179 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GeographicLocations | Poland |
| GeographicLocations_xml | – name: Poland |
| GroupedDBID | -~X .2P 5GY AAMVS ABPTD ACGFS ADZXQ ALMA_UNASSIGNED_HOLDINGS BCRHZ F5P HW0 KOP Q5Y RD5 ROX TLC TN5 TOX WH7 --- -E4 .DC .I3 0R~ 23N 2WC 4.4 48X 53G 5WA 70D AAIJN AAIMJ AAJKP AAKPC AAMDB AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN AAYXX ABEJV ABEUO ABGNP ABIXL ABNKS ABPQP ABQLI ABWST ABXVV ABZBJ ACIWK ACPRK ACUFI ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AIJHB AJEEA AJEUX AKHUL AKWXX ALTZX ALUQC AMNDL APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C45 CDBKE CITATION CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EMOBN F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HZ~ IOX J21 JXSIZ KAQDR KQ8 KSI KSN M-Z MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NU- O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. R44 RNS ROL RPM RUSNO RW1 RXO SV3 TEORI TJP TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~91 ~KM ADRIX AFXEN CGR CUY CVF ECM EIF M49 NPM 7X8 |
| ID | FETCH-LOGICAL-c401t-206dce3eda8def6e75ec92c47c353a0c5dab0f21abc824c71d2e15302fe8bbd53 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 34 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550127500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4803 1367-4811 |
| IngestDate | Fri Jul 11 11:50:21 EDT 2025 Wed Feb 19 02:29:04 EST 2025 Tue Nov 18 21:04:23 EST 2025 Sat Nov 29 03:49:17 EST 2025 Wed Aug 28 03:19:48 EDT 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Language | English |
| License | This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c401t-206dce3eda8def6e75ec92c47c353a0c5dab0f21abc824c71d2e15302fe8bbd53 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0002-6548-7863 0000-0001-8613-6619 |
| OpenAccessLink | https://academic.oup.com/bioinformatics/article-pdf/36/12/3669/33437318/btaa179.pdf |
| PMID | 32167530 |
| PQID | 2377337562 |
| PQPubID | 23479 |
| PageCount | 11 |
| ParticipantIDs | proquest_miscellaneous_2377337562 pubmed_primary_32167530 crossref_primary_10_1093_bioinformatics_btaa179 crossref_citationtrail_10_1093_bioinformatics_btaa179 oup_primary_10_1093_bioinformatics_btaa179 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-06-01 |
| PublicationDateYYYYMMDD | 2020-06-01 |
| PublicationDate_xml | – month: 06 year: 2020 text: 2020-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Bioinformatics |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2020 |
| Publisher | Oxford University Press |
| Publisher_xml | – name: Oxford University Press |
| References | Sanger (2023063010282440600_btaa179-B38) 1977; 74 Baum (2023063010282440600_btaa179-B6) 1972; 3 Alser (2023063010282440600_btaa179-B4) 2019 Glenn (2023063010282440600_btaa179-B16) 2011; 11 Zhang (2023063010282440600_btaa179-B48) 2015; 3 Döring (2023063010282440600_btaa179-B12) 2008; 9 Payne (2023063010282440600_btaa179-B33) 2018 Alkan (2023063010282440600_btaa179-B1) 2011; 8 Liu (2023063010282440600_btaa179-B28) 2009 Vaser (2023063010282440600_btaa179-B41) 2017; 27 Firtina (2023063010282440600_btaa179-B14) 2016; 32 Kurtz (2023063010282440600_btaa179-B23) 2004; 5 Weirather (2023063010282440600_btaa179-B44) 2017; 6 Yu (2023063010282440600_btaa179-B47) 2014 Huddleston (2023063010282440600_btaa179-B18) 2014; 24 Meltz Steinberg (2023063010282440600_btaa179-B30) 2017; 105 Alser (2023063010282440600_btaa179-B2) 2017; 33 Pearson (2023063010282440600_btaa179-B34) 1988; 85 Wenger (2023063010282440600_btaa179-B45) 2019; 37 Kim (2023063010282440600_btaa179-B20) 2018; 19 Alser (2023063010282440600_btaa179-B3) 2019; 35 Au (2023063010282440600_btaa179-B5) 2012; 7 Li (2023063010282440600_btaa179-B27) 2009; 25 Li (2023063010282440600_btaa179-B26) 2009; 25 Gurevich (2023063010282440600_btaa179-B17) 2013; 29 Viterbi (2023063010282440600_btaa179-B42) 1967; 13 Loman (2023063010282440600_btaa179-B29) 2015; 12 Rhoads (2023063010282440600_btaa179-B35) 2015; 13 Li (2023063010282440600_btaa179-B25) 2018; 34 Salmela (2023063010282440600_btaa179-B37) 2016; 33 Chaisson (2023063010282440600_btaa179-B10) 2015; 16 Koren (2023063010282440600_btaa179-B21) 2012; 30 Li (2023063010282440600_btaa179-B24) 2016; 32 Chaisson (2023063010282440600_btaa179-B9) 2012; 13 Jain (2023063010282440600_btaa179-B19) 2018; 36 Chin (2023063010282440600_btaa179-B11) 2013; 10 Niwattanakul (2023063010282440600_btaa179-B32) 2013 Chaisson (2023063010282440600_btaa179-B8) 2004; 20 Firtina (2023063010282440600_btaa179-B15) 2018; 46 Senol Cali (2023063010282440600_btaa179-B39) 2019; 20 Eddy (2023063010282440600_btaa179-B13) 2011; 7 Xin (2023063010282440600_btaa179-B46) 2013; 14 Berlin (2023063010282440600_btaa179-B7) 2015; 33 Walker (2023063010282440600_btaa179-B43) 2014; 9 Koren (2023063010282440600_btaa179-B22) 2017; 27 Brown (2023063010282440600_btaa179-B40) 2016; 1 Murakami (2023063010282440600_btaa179-B31) 2017; 2017 Salmela (2023063010282440600_btaa179-B36) 2014; 30 |
| References_xml | – year: 2018 ident: 2023063010282440600_btaa179-B33 article-title: BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files publication-title: Bioinformatics – volume: 7 start-page: e1002195 year: 2011 ident: 2023063010282440600_btaa179-B13 article-title: Accelerated profile HMM searches publication-title: PLoS Comput. Biol doi: 10.1371/journal.pcbi.1002195 – volume: 37 start-page: 1155 year: 2019 ident: 2023063010282440600_btaa179-B45 article-title: Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome publication-title: Nat. Biotechnol doi: 10.1038/s41587-019-0217-9 – volume: 30 start-page: 3506 year: 2014 ident: 2023063010282440600_btaa179-B36 article-title: LoRDEC: accurate and efficient long read error correction publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu538 – volume: 33 start-page: 623 year: 2015 ident: 2023063010282440600_btaa179-B7 article-title: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing publication-title: Nat. Biotechnol doi: 10.1038/nbt.3238 – volume: 13 start-page: 278 year: 2015 ident: 2023063010282440600_btaa179-B35 article-title: PacBio sequencing and its applications publication-title: Genomics Proteomics Bioinform doi: 10.1016/j.gpb.2015.08.002 – volume: 30 start-page: 693 year: 2012 ident: 2023063010282440600_btaa179-B21 article-title: Hybrid error correction and de novo assembly of single-molecule sequencing reads publication-title: Nat. Biotechnol doi: 10.1038/nbt.2280 – start-page: 1 year: 2009 ident: 2023063010282440600_btaa179-B28 article-title: cuHMM: a CUDA implementation of hidden Markov Model training and classification publication-title: Chron. High. Educ – volume: 13 start-page: 260 year: 1967 ident: 2023063010282440600_btaa179-B42 article-title: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm publication-title: IEEE Trans. Inf. Theory doi: 10.1109/TIT.1967.1054010 – volume: 12 start-page: 733 year: 2015 ident: 2023063010282440600_btaa179-B29 article-title: A complete bacterial genome assembled de novo using only nanopore sequencing data publication-title: Nat. Methods doi: 10.1038/nmeth.3444 – volume: 20 start-page: 1542 year: 2019 ident: 2023063010282440600_btaa179-B39 article-title: Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions publication-title: Brief. Bioinform doi: 10.1093/bib/bby017 – volume: 33 start-page: 3355 year: 2017 ident: 2023063010282440600_btaa179-B2 article-title: GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx342 – volume: 3 start-page: 1 year: 1972 ident: 2023063010282440600_btaa179-B6 article-title: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process publication-title: Inequalities – volume: 7 start-page: e46679 year: 2012 ident: 2023063010282440600_btaa179-B5 article-title: Improving PacBio long read accuracy by short read alignment publication-title: PLoS One doi: 10.1371/journal.pone.0046679 – volume: 25 start-page: 1754 year: 2009 ident: 2023063010282440600_btaa179-B26 article-title: Fast and accurate short read alignment with Burrows–Wheeler transform publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp324 – volume: 74 start-page: 5463 year: 1977 ident: 2023063010282440600_btaa179-B38 article-title: DNA sequencing with chain-terminating inhibitors publication-title: Proc. Natl. Acad. Sci doi: 10.1073/pnas.74.12.5463 – volume: 33 start-page: 799 year: 2016 ident: 2023063010282440600_btaa179-B37 article-title: Accurate self-correction of errors in long reads using de Bruijn graphs publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw321 – year: 2019 ident: 2023063010282440600_btaa179-B4 – volume: 19 start-page: 89 year: 2018 ident: 2023063010282440600_btaa179-B20 article-title: GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies publication-title: BMC Genomics doi: 10.1186/s12864-018-4460-0 – volume: 25 start-page: 2078 year: 2009 ident: 2023063010282440600_btaa179-B27 article-title: The sequence alignment/map format and SAMtools publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp352 – start-page: 395 year: 2014 ident: 2023063010282440600_btaa179-B47 – volume: 6 start-page: 100 year: 2017 ident: 2023063010282440600_btaa179-B44 article-title: Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis publication-title: F1000Research doi: 10.12688/f1000research.10571.2 – volume: 20 start-page: 2067 year: 2004 ident: 2023063010282440600_btaa179-B8 article-title: Fragment assembly with short reads publication-title: Bioinformatics doi: 10.1093/bioinformatics/bth205 – volume: 105 start-page: 1 year: 2017 ident: 2023063010282440600_btaa179-B30 article-title: Building and improving reference genome assemblies publication-title: Proc. IEEE doi: 10.1109/JPROC.2016.2645402 – volume: 10 start-page: 563 year: 2013 ident: 2023063010282440600_btaa179-B11 article-title: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data publication-title: Nat. Methods doi: 10.1038/nmeth.2474 – volume: 32 start-page: 2103 year: 2016 ident: 2023063010282440600_btaa179-B24 article-title: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw152 – volume: 85 start-page: 2444 year: 1988 ident: 2023063010282440600_btaa179-B34 article-title: Improved tools for biological sequence comparison publication-title: Proc. Natl. Acad. Sci doi: 10.1073/pnas.85.8.2444 – volume: 36 start-page: 338 year: 2018 ident: 2023063010282440600_btaa179-B19 article-title: Nanopore sequencing and assembly of a human genome with ultra-long reads publication-title: Nat. Biotechnol doi: 10.1038/nbt.4060 – volume: 9 start-page: e112963 year: 2014 ident: 2023063010282440600_btaa179-B43 article-title: Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement publication-title: PLoS One doi: 10.1371/journal.pone.0112963 – volume: 46 start-page: e125 year: 2018 ident: 2023063010282440600_btaa179-B15 article-title: Hercules: a profile HMM-based hybrid error correction algorithm for long reads publication-title: Nucleic Acids Res – volume: 27 start-page: 722 year: 2017 ident: 2023063010282440600_btaa179-B22 article-title: Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation publication-title: Genome Res doi: 10.1101/gr.215087.116 – volume: 24 start-page: 688 year: 2014 ident: 2023063010282440600_btaa179-B18 article-title: Reconstructing complex regions of genomes using long-read sequencing technology publication-title: Genome Res doi: 10.1101/gr.168450.113 – volume: 34 start-page: 3094 year: 2018 ident: 2023063010282440600_btaa179-B25 article-title: Minimap2: pairwise alignment for nucleotide sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty191 – volume: 16 start-page: 627 year: 2015 ident: 2023063010282440600_btaa179-B10 article-title: Genetic variation and the de novo assembly of human genomes publication-title: Nat. Rev. Genet doi: 10.1038/nrg3933 – volume: 1 start-page: 27 year: 2016 ident: 2023063010282440600_btaa179-B40 article-title: sourmash: a library for MinHash sketching of DNA publication-title: J. Open Source Softw doi: 10.21105/joss.00027 – volume: 5 start-page: R12 year: 2004 ident: 2023063010282440600_btaa179-B23 article-title: Versatile and open software for comparing large genomes publication-title: Genome Biol doi: 10.1186/gb-2004-5-2-r12 – start-page: . 380 year: 2013 ident: 2023063010282440600_btaa179-B32 – volume: 9 start-page: 11 year: 2008 ident: 2023063010282440600_btaa179-B12 article-title: SeqAn an efficient, generic C++ library for sequence analysis publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-9-11 – volume: 27 start-page: 737 year: 2017 ident: 2023063010282440600_btaa179-B41 article-title: Fast and accurate de novo genome assembly from long uncorrected reads publication-title: Genome Res doi: 10.1101/gr.214270.116 – volume: 32 start-page: 2243 year: 2016 ident: 2023063010282440600_btaa179-B14 article-title: On genomic repeats and reproducibility publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw139 – volume: 8 start-page: 61 year: 2011 ident: 2023063010282440600_btaa179-B1 article-title: Limitations of next-generation genome sequence assembly publication-title: Nat. Methods doi: 10.1038/nmeth.1527 – volume: 11 start-page: 759 year: 2011 ident: 2023063010282440600_btaa179-B16 article-title: Field guide to next-generation DNA sequencers publication-title: Mol. Ecol. Resour doi: 10.1111/j.1755-0998.2011.03024.x – volume: 13 start-page: 238 year: 2012 ident: 2023063010282440600_btaa179-B9 article-title: Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-13-238 – volume: 29 start-page: 1072 year: 2013 ident: 2023063010282440600_btaa179-B17 article-title: QUAST: quality assessment tool for genome assemblies publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt086 – volume: 35 start-page: 4255 year: 2019 ident: 2023063010282440600_btaa179-B3 article-title: Shouji: a fast and efficient pre-alignment filter for sequence alignment publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz234 – volume: 2017 start-page: 138 year: 2017 ident: 2023063010282440600_btaa179-B31 article-title: Expectation–maximization tensor factorization for practical location privacy attacks publication-title: Proc. Privacy Enhancing Technol doi: 10.1515/popets-2017-0042 – volume: 14 start-page: S13 year: 2013 ident: 2023063010282440600_btaa179-B46 article-title: Accelerating read mapping with FastHASH publication-title: BMC Genomics doi: 10.1186/1471-2164-14-S1-S13 – volume: 3 start-page: e890v1 year: 2015 ident: 2023063010282440600_btaa179-B48 article-title: Crossing the streams: a framework for streaming analysis of short DNA sequencing reads publication-title: PeerJ PrePrints |
| SSID | ssj0051444 ssj0005056 |
| Score | 2.5192215 |
| Snippet | Abstract
Motivation
Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to... Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an... |
| SourceID | proquest pubmed crossref oup |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 3669 |
| SubjectTerms | Algorithms High-Throughput Nucleotide Sequencing Poland Sequence Analysis, DNA Software Technology |
| Title | Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/32167530 https://www.proquest.com/docview/2377337562 |
| Volume | 36 |
| WOSCitedRecordID | wos000550127500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 20220930 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bT9swFD6Caki8bIzbOqAyEk-IqPGtdnhDE2gPE9sDm_oWOccOVGoT1KaT-Pezk7RTQYjLY6TYiXyOfW4-3wdwYpHqROAgEoNYR4IaGWlnRMRQOpprzOOaP-XPD3V9rYfD5Nca0EUvzOMSfsL72ahsQUQDcHE_q4zxWuRPXSp10Oybn8P_lzriAA3TPHhXQDSctgHaW8d80SD87Jwrtmml3-2J21mbn6tP7_jxLfjY-prkolGOz7Dmim3YaNgnH3Ygvbj3SlCeE0PaC9XejEXVMtcejZYUudUZmXlhhjYrYgpLDOI8YEwQ73q7STZ-IIHsoc5mETO-Laej6m6yC7-vLm--fY9avoUIfZRV-Q0zsOi4s0Zblw-ckg4ThkIhl9zEKK3J4pxRk6FmAhW1zNHAOpQ7nWVW8j3oFGXhvgChQmKABkyE4gJzrq1UsTbWMR-P-wiyC3Kx0im2YOSBE2OcNkVxnq4uXtouXhf6y3H3DRzHiyNOvSBf_fLxQt6p32ahdmIKV85nKeNKca68t9iF_UYRlnNyRn3YxeOvb_nUAWyyELrXCZ1D6FTTuTuCD_i3Gs2mPVhXQ92rdfsfWyT_7g |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Apollo%3A+a+sequencing-technology-independent%2C+scalable+and+accurate+assembly+polishing+algorithm&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Firtina%2C+Can&rft.au=Kim%2C+Jeremie+S&rft.au=Alser%2C+Mohammed&rft.au=Senol+Cali%2C+Damla&rft.date=2020-06-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=36&rft.issue=12&rft.spage=3669&rft.epage=3679&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaa179&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btaa179 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |