Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequ...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Bioinformatics Ročník 36; číslo 12; s. 3669 - 3679
Hlavní autoři: Firtina, Can, Kim, Jeremie S, Alser, Mohammed, Senol Cali, Damla, Cicek, A Ercument, Alkan, Can, Mutlu, Onur
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 01.06.2020
Témata:
ISSN:1367-4803, 1367-4811, 1460-2059, 1367-4811
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.MOTIVATIONThird-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.RESULTSWe introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.Source code is available at https://github.com/CMU-SAFARI/Apollo.AVAILABILITY AND IMPLEMENTATIONSource code is available at https://github.com/CMU-SAFARI/Apollo.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.
Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary data are available at Bioinformatics online.
Author Senol Cali, Damla
Alkan, Can
Firtina, Can
Cicek, A Ercument
Mutlu, Onur
Kim, Jeremie S
Alser, Mohammed
Author_xml – sequence: 1
  givenname: Can
  orcidid: 0000-0002-6548-7863
  surname: Firtina
  fullname: Firtina, Can
  email: can.firtina@inf.ethz.ch
  organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
– sequence: 2
  givenname: Jeremie S
  surname: Kim
  fullname: Kim, Jeremie S
  organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
– sequence: 3
  givenname: Mohammed
  surname: Alser
  fullname: Alser, Mohammed
  organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
– sequence: 4
  givenname: Damla
  surname: Senol Cali
  fullname: Senol Cali, Damla
  organization: Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
– sequence: 5
  givenname: A Ercument
  orcidid: 0000-0001-8613-6619
  surname: Cicek
  fullname: Cicek, A Ercument
  organization: Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
– sequence: 6
  givenname: Can
  surname: Alkan
  fullname: Alkan, Can
  email: can.firtina@inf.ethz.ch
  organization: Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
– sequence: 7
  givenname: Onur
  surname: Mutlu
  fullname: Mutlu, Onur
  email: can.firtina@inf.ethz.ch
  organization: Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32167530$$D View this record in MEDLINE/PubMed
BookMark eNqNkEtv1DAUhS3Uij7gL1RZsiDUjySeQWyqqgWkSt3AOrq-vpkxcuxgO4v59xjNgAQb2Pghfefce84VOwsxEGM3gr8TfKtujYsuTDHNUBzmW1MAhN6-YJeiG3greb89q2816LbbcHXBrnL-xnkvuq57yS6UFIPuFb9k490SvY_vG2gyfV8poAu7thDuQ_Rxd2hdsLRQPUJ522QED8ZTA8E2gLgmKPWTM83GH5pq5fK-GjTgdzG5sp9fsfMJfKbXp_uafX18-HL_qX16_vj5_u6pxY6LUhceLJIiCxtL00C6J9xK7DSqXgHH3oLhkxRgcCM71MJKEjWAnGhjjO3VNXtz9F1SrDFyGWeXkbyHQHHNo1RaK6X7QVb05oSuZiY7LsnNkA7jr1IqMBwBTDHnRNNvRPDxZ_vjn-2Pp_ar8MNfQnSlIjGUBM7_Wy6O8rgu_zvyB57Cp9Y
CitedBy_id crossref_primary_10_1093_nar_gkaa889
crossref_primary_10_3390_ijms252111603
crossref_primary_10_1016_j_syapm_2025_126643
crossref_primary_10_1007_s11033_022_07135_4
crossref_primary_10_1186_s12859_025_06091_7
crossref_primary_10_7717_peerj_18132
crossref_primary_10_1007_s13258_023_01458_7
crossref_primary_10_1093_bib_bbad264
crossref_primary_10_1093_nargab_lqab034
crossref_primary_10_1145_3632950
crossref_primary_10_1093_nargab_lqad004
crossref_primary_10_1038_s41598_024_58934_7
crossref_primary_10_1016_j_csbj_2022_08_019
crossref_primary_10_1093_bib_bbab405
crossref_primary_10_1186_s13059_020_02235_5
crossref_primary_10_3390_horticulturae9030302
crossref_primary_10_3390_ijms232012080
crossref_primary_10_1038_s42003_023_05619_y
crossref_primary_10_1007_s00425_022_03987_z
crossref_primary_10_1038_s41598_021_00178_w
crossref_primary_10_1186_s13059_024_03181_2
crossref_primary_10_1016_j_ygeno_2024_110842
crossref_primary_10_1016_j_envres_2025_122591
crossref_primary_10_1186_s12864_022_08577_7
Cites_doi 10.1371/journal.pcbi.1002195
10.1038/s41587-019-0217-9
10.1093/bioinformatics/btu538
10.1038/nbt.3238
10.1016/j.gpb.2015.08.002
10.1038/nbt.2280
10.1109/TIT.1967.1054010
10.1038/nmeth.3444
10.1093/bib/bby017
10.1093/bioinformatics/btx342
10.1371/journal.pone.0046679
10.1093/bioinformatics/btp324
10.1073/pnas.74.12.5463
10.1093/bioinformatics/btw321
10.1186/s12864-018-4460-0
10.1093/bioinformatics/btp352
10.12688/f1000research.10571.2
10.1093/bioinformatics/bth205
10.1109/JPROC.2016.2645402
10.1038/nmeth.2474
10.1093/bioinformatics/btw152
10.1073/pnas.85.8.2444
10.1038/nbt.4060
10.1371/journal.pone.0112963
10.1101/gr.215087.116
10.1101/gr.168450.113
10.1093/bioinformatics/bty191
10.1038/nrg3933
10.21105/joss.00027
10.1186/gb-2004-5-2-r12
10.1186/1471-2105-9-11
10.1101/gr.214270.116
10.1093/bioinformatics/btw139
10.1038/nmeth.1527
10.1111/j.1755-0998.2011.03024.x
10.1186/1471-2105-13-238
10.1093/bioinformatics/btt086
10.1093/bioinformatics/btz234
10.1515/popets-2017-0042
10.1186/1471-2164-14-S1-S13
ContentType Journal Article
Copyright The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020
The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Copyright_xml – notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020
– notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1093/bioinformatics/btaa179
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1460-2059
1367-4811
EndPage 3679
ExternalDocumentID 32167530
10_1093_bioinformatics_btaa179
10.1093/bioinformatics/btaa179
Genre Research Support, Non-U.S. Gov't
Journal Article
GeographicLocations Poland
GeographicLocations_xml – name: Poland
GroupedDBID -~X
.2P
5GY
AAMVS
ABPTD
ACGFS
ADZXQ
ALMA_UNASSIGNED_HOLDINGS
BCRHZ
F5P
HW0
KOP
Q5Y
RD5
ROX
TLC
TN5
TOX
WH7
---
-E4
.DC
.I3
0R~
23N
2WC
4.4
48X
53G
5WA
70D
AAIJN
AAIMJ
AAJKP
AAKPC
AAMDB
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
AAYXX
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPQP
ABQLI
ABWST
ABXVV
ABZBJ
ACIWK
ACPRK
ACUFI
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C45
CDBKE
CITATION
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EMOBN
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HZ~
IOX
J21
JXSIZ
KAQDR
KQ8
KSI
KSN
M-Z
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NU-
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
R44
RNS
ROL
RPM
RUSNO
RW1
RXO
SV3
TEORI
TJP
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~91
~KM
ADRIX
AFXEN
CGR
CUY
CVF
ECM
EIF
M49
NPM
7X8
ID FETCH-LOGICAL-c401t-206dce3eda8def6e75ec92c47c353a0c5dab0f21abc824c71d2e15302fe8bbd53
IEDL.DBID TOX
ISICitedReferencesCount 34
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550127500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Fri Jul 11 11:50:21 EDT 2025
Wed Feb 19 02:29:04 EST 2025
Tue Nov 18 21:04:23 EST 2025
Sat Nov 29 03:49:17 EST 2025
Wed Aug 28 03:19:48 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c401t-206dce3eda8def6e75ec92c47c353a0c5dab0f21abc824c71d2e15302fe8bbd53
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-6548-7863
0000-0001-8613-6619
OpenAccessLink https://academic.oup.com/bioinformatics/article-pdf/36/12/3669/33437318/btaa179.pdf
PMID 32167530
PQID 2377337562
PQPubID 23479
PageCount 11
ParticipantIDs proquest_miscellaneous_2377337562
pubmed_primary_32167530
crossref_primary_10_1093_bioinformatics_btaa179
crossref_citationtrail_10_1093_bioinformatics_btaa179
oup_primary_10_1093_bioinformatics_btaa179
PublicationCentury 2000
PublicationDate 2020-06-01
PublicationDateYYYYMMDD 2020-06-01
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-06-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics
PublicationTitleAlternate Bioinformatics
PublicationYear 2020
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Sanger (2023063010282440600_btaa179-B38) 1977; 74
Baum (2023063010282440600_btaa179-B6) 1972; 3
Alser (2023063010282440600_btaa179-B4) 2019
Glenn (2023063010282440600_btaa179-B16) 2011; 11
Zhang (2023063010282440600_btaa179-B48) 2015; 3
Döring (2023063010282440600_btaa179-B12) 2008; 9
Payne (2023063010282440600_btaa179-B33) 2018
Alkan (2023063010282440600_btaa179-B1) 2011; 8
Liu (2023063010282440600_btaa179-B28) 2009
Vaser (2023063010282440600_btaa179-B41) 2017; 27
Firtina (2023063010282440600_btaa179-B14) 2016; 32
Kurtz (2023063010282440600_btaa179-B23) 2004; 5
Weirather (2023063010282440600_btaa179-B44) 2017; 6
Yu (2023063010282440600_btaa179-B47) 2014
Huddleston (2023063010282440600_btaa179-B18) 2014; 24
Meltz Steinberg (2023063010282440600_btaa179-B30) 2017; 105
Alser (2023063010282440600_btaa179-B2) 2017; 33
Pearson (2023063010282440600_btaa179-B34) 1988; 85
Wenger (2023063010282440600_btaa179-B45) 2019; 37
Kim (2023063010282440600_btaa179-B20) 2018; 19
Alser (2023063010282440600_btaa179-B3) 2019; 35
Au (2023063010282440600_btaa179-B5) 2012; 7
Li (2023063010282440600_btaa179-B27) 2009; 25
Li (2023063010282440600_btaa179-B26) 2009; 25
Gurevich (2023063010282440600_btaa179-B17) 2013; 29
Viterbi (2023063010282440600_btaa179-B42) 1967; 13
Loman (2023063010282440600_btaa179-B29) 2015; 12
Rhoads (2023063010282440600_btaa179-B35) 2015; 13
Li (2023063010282440600_btaa179-B25) 2018; 34
Salmela (2023063010282440600_btaa179-B37) 2016; 33
Chaisson (2023063010282440600_btaa179-B10) 2015; 16
Koren (2023063010282440600_btaa179-B21) 2012; 30
Li (2023063010282440600_btaa179-B24) 2016; 32
Chaisson (2023063010282440600_btaa179-B9) 2012; 13
Jain (2023063010282440600_btaa179-B19) 2018; 36
Chin (2023063010282440600_btaa179-B11) 2013; 10
Niwattanakul (2023063010282440600_btaa179-B32) 2013
Chaisson (2023063010282440600_btaa179-B8) 2004; 20
Firtina (2023063010282440600_btaa179-B15) 2018; 46
Senol Cali (2023063010282440600_btaa179-B39) 2019; 20
Eddy (2023063010282440600_btaa179-B13) 2011; 7
Xin (2023063010282440600_btaa179-B46) 2013; 14
Berlin (2023063010282440600_btaa179-B7) 2015; 33
Walker (2023063010282440600_btaa179-B43) 2014; 9
Koren (2023063010282440600_btaa179-B22) 2017; 27
Brown (2023063010282440600_btaa179-B40) 2016; 1
Murakami (2023063010282440600_btaa179-B31) 2017; 2017
Salmela (2023063010282440600_btaa179-B36) 2014; 30
References_xml – year: 2018
  ident: 2023063010282440600_btaa179-B33
  article-title: BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files
  publication-title: Bioinformatics
– volume: 7
  start-page: e1002195
  year: 2011
  ident: 2023063010282440600_btaa179-B13
  article-title: Accelerated profile HMM searches
  publication-title: PLoS Comput. Biol
  doi: 10.1371/journal.pcbi.1002195
– volume: 37
  start-page: 1155
  year: 2019
  ident: 2023063010282440600_btaa179-B45
  article-title: Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
  publication-title: Nat. Biotechnol
  doi: 10.1038/s41587-019-0217-9
– volume: 30
  start-page: 3506
  year: 2014
  ident: 2023063010282440600_btaa179-B36
  article-title: LoRDEC: accurate and efficient long read error correction
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu538
– volume: 33
  start-page: 623
  year: 2015
  ident: 2023063010282440600_btaa179-B7
  article-title: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
  publication-title: Nat. Biotechnol
  doi: 10.1038/nbt.3238
– volume: 13
  start-page: 278
  year: 2015
  ident: 2023063010282440600_btaa179-B35
  article-title: PacBio sequencing and its applications
  publication-title: Genomics Proteomics Bioinform
  doi: 10.1016/j.gpb.2015.08.002
– volume: 30
  start-page: 693
  year: 2012
  ident: 2023063010282440600_btaa179-B21
  article-title: Hybrid error correction and de novo assembly of single-molecule sequencing reads
  publication-title: Nat. Biotechnol
  doi: 10.1038/nbt.2280
– start-page: 1
  year: 2009
  ident: 2023063010282440600_btaa179-B28
  article-title: cuHMM: a CUDA implementation of hidden Markov Model training and classification
  publication-title: Chron. High. Educ
– volume: 13
  start-page: 260
  year: 1967
  ident: 2023063010282440600_btaa179-B42
  article-title: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
  publication-title: IEEE Trans. Inf. Theory
  doi: 10.1109/TIT.1967.1054010
– volume: 12
  start-page: 733
  year: 2015
  ident: 2023063010282440600_btaa179-B29
  article-title: A complete bacterial genome assembled de novo using only nanopore sequencing data
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.3444
– volume: 20
  start-page: 1542
  year: 2019
  ident: 2023063010282440600_btaa179-B39
  article-title: Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions
  publication-title: Brief. Bioinform
  doi: 10.1093/bib/bby017
– volume: 33
  start-page: 3355
  year: 2017
  ident: 2023063010282440600_btaa179-B2
  article-title: GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx342
– volume: 3
  start-page: 1
  year: 1972
  ident: 2023063010282440600_btaa179-B6
  article-title: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process
  publication-title: Inequalities
– volume: 7
  start-page: e46679
  year: 2012
  ident: 2023063010282440600_btaa179-B5
  article-title: Improving PacBio long read accuracy by short read alignment
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0046679
– volume: 25
  start-page: 1754
  year: 2009
  ident: 2023063010282440600_btaa179-B26
  article-title: Fast and accurate short read alignment with Burrows–Wheeler transform
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp324
– volume: 74
  start-page: 5463
  year: 1977
  ident: 2023063010282440600_btaa179-B38
  article-title: DNA sequencing with chain-terminating inhibitors
  publication-title: Proc. Natl. Acad. Sci
  doi: 10.1073/pnas.74.12.5463
– volume: 33
  start-page: 799
  year: 2016
  ident: 2023063010282440600_btaa179-B37
  article-title: Accurate self-correction of errors in long reads using de Bruijn graphs
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw321
– year: 2019
  ident: 2023063010282440600_btaa179-B4
– volume: 19
  start-page: 89
  year: 2018
  ident: 2023063010282440600_btaa179-B20
  article-title: GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies
  publication-title: BMC Genomics
  doi: 10.1186/s12864-018-4460-0
– volume: 25
  start-page: 2078
  year: 2009
  ident: 2023063010282440600_btaa179-B27
  article-title: The sequence alignment/map format and SAMtools
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp352
– start-page: 395
  year: 2014
  ident: 2023063010282440600_btaa179-B47
– volume: 6
  start-page: 100
  year: 2017
  ident: 2023063010282440600_btaa179-B44
  article-title: Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis
  publication-title: F1000Research
  doi: 10.12688/f1000research.10571.2
– volume: 20
  start-page: 2067
  year: 2004
  ident: 2023063010282440600_btaa179-B8
  article-title: Fragment assembly with short reads
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth205
– volume: 105
  start-page: 1
  year: 2017
  ident: 2023063010282440600_btaa179-B30
  article-title: Building and improving reference genome assemblies
  publication-title: Proc. IEEE
  doi: 10.1109/JPROC.2016.2645402
– volume: 10
  start-page: 563
  year: 2013
  ident: 2023063010282440600_btaa179-B11
  article-title: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.2474
– volume: 32
  start-page: 2103
  year: 2016
  ident: 2023063010282440600_btaa179-B24
  article-title: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw152
– volume: 85
  start-page: 2444
  year: 1988
  ident: 2023063010282440600_btaa179-B34
  article-title: Improved tools for biological sequence comparison
  publication-title: Proc. Natl. Acad. Sci
  doi: 10.1073/pnas.85.8.2444
– volume: 36
  start-page: 338
  year: 2018
  ident: 2023063010282440600_btaa179-B19
  article-title: Nanopore sequencing and assembly of a human genome with ultra-long reads
  publication-title: Nat. Biotechnol
  doi: 10.1038/nbt.4060
– volume: 9
  start-page: e112963
  year: 2014
  ident: 2023063010282440600_btaa179-B43
  article-title: Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0112963
– volume: 46
  start-page: e125
  year: 2018
  ident: 2023063010282440600_btaa179-B15
  article-title: Hercules: a profile HMM-based hybrid error correction algorithm for long reads
  publication-title: Nucleic Acids Res
– volume: 27
  start-page: 722
  year: 2017
  ident: 2023063010282440600_btaa179-B22
  article-title: Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation
  publication-title: Genome Res
  doi: 10.1101/gr.215087.116
– volume: 24
  start-page: 688
  year: 2014
  ident: 2023063010282440600_btaa179-B18
  article-title: Reconstructing complex regions of genomes using long-read sequencing technology
  publication-title: Genome Res
  doi: 10.1101/gr.168450.113
– volume: 34
  start-page: 3094
  year: 2018
  ident: 2023063010282440600_btaa179-B25
  article-title: Minimap2: pairwise alignment for nucleotide sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty191
– volume: 16
  start-page: 627
  year: 2015
  ident: 2023063010282440600_btaa179-B10
  article-title: Genetic variation and the de novo assembly of human genomes
  publication-title: Nat. Rev. Genet
  doi: 10.1038/nrg3933
– volume: 1
  start-page: 27
  year: 2016
  ident: 2023063010282440600_btaa179-B40
  article-title: sourmash: a library for MinHash sketching of DNA
  publication-title: J. Open Source Softw
  doi: 10.21105/joss.00027
– volume: 5
  start-page: R12
  year: 2004
  ident: 2023063010282440600_btaa179-B23
  article-title: Versatile and open software for comparing large genomes
  publication-title: Genome Biol
  doi: 10.1186/gb-2004-5-2-r12
– start-page: . 380
  year: 2013
  ident: 2023063010282440600_btaa179-B32
– volume: 9
  start-page: 11
  year: 2008
  ident: 2023063010282440600_btaa179-B12
  article-title: SeqAn an efficient, generic C++ library for sequence analysis
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-9-11
– volume: 27
  start-page: 737
  year: 2017
  ident: 2023063010282440600_btaa179-B41
  article-title: Fast and accurate de novo genome assembly from long uncorrected reads
  publication-title: Genome Res
  doi: 10.1101/gr.214270.116
– volume: 32
  start-page: 2243
  year: 2016
  ident: 2023063010282440600_btaa179-B14
  article-title: On genomic repeats and reproducibility
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw139
– volume: 8
  start-page: 61
  year: 2011
  ident: 2023063010282440600_btaa179-B1
  article-title: Limitations of next-generation genome sequence assembly
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.1527
– volume: 11
  start-page: 759
  year: 2011
  ident: 2023063010282440600_btaa179-B16
  article-title: Field guide to next-generation DNA sequencers
  publication-title: Mol. Ecol. Resour
  doi: 10.1111/j.1755-0998.2011.03024.x
– volume: 13
  start-page: 238
  year: 2012
  ident: 2023063010282440600_btaa179-B9
  article-title: Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-13-238
– volume: 29
  start-page: 1072
  year: 2013
  ident: 2023063010282440600_btaa179-B17
  article-title: QUAST: quality assessment tool for genome assemblies
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt086
– volume: 35
  start-page: 4255
  year: 2019
  ident: 2023063010282440600_btaa179-B3
  article-title: Shouji: a fast and efficient pre-alignment filter for sequence alignment
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz234
– volume: 2017
  start-page: 138
  year: 2017
  ident: 2023063010282440600_btaa179-B31
  article-title: Expectation–maximization tensor factorization for practical location privacy attacks
  publication-title: Proc. Privacy Enhancing Technol
  doi: 10.1515/popets-2017-0042
– volume: 14
  start-page: S13
  year: 2013
  ident: 2023063010282440600_btaa179-B46
  article-title: Accelerating read mapping with FastHASH
  publication-title: BMC Genomics
  doi: 10.1186/1471-2164-14-S1-S13
– volume: 3
  start-page: e890v1
  year: 2015
  ident: 2023063010282440600_btaa179-B48
  article-title: Crossing the streams: a framework for streaming analysis of short DNA sequencing reads
  publication-title: PeerJ PrePrints
SSID ssj0051444
ssj0005056
Score 2.5192215
Snippet Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to...
Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an...
SourceID proquest
pubmed
crossref
oup
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 3669
SubjectTerms Algorithms
High-Throughput Nucleotide Sequencing
Poland
Sequence Analysis, DNA
Software
Technology
Title Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm
URI https://www.ncbi.nlm.nih.gov/pubmed/32167530
https://www.proquest.com/docview/2377337562
Volume 36
WOSCitedRecordID wos000550127500007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bT9swFD6Caki8bIzbOqAyEk-IqPGtdnhDE2gPE9sDm_oWOccOVGoT1KaT-Pezk7RTQYjLY6TYiXyOfW4-3wdwYpHqROAgEoNYR4IaGWlnRMRQOpprzOOaP-XPD3V9rYfD5Nca0EUvzOMSfsL72ahsQUQDcHE_q4zxWuRPXSp10Oybn8P_lzriAA3TPHhXQDSctgHaW8d80SD87Jwrtmml3-2J21mbn6tP7_jxLfjY-prkolGOz7Dmim3YaNgnH3Ygvbj3SlCeE0PaC9XejEXVMtcejZYUudUZmXlhhjYrYgpLDOI8YEwQ73q7STZ-IIHsoc5mETO-Laej6m6yC7-vLm--fY9avoUIfZRV-Q0zsOi4s0Zblw-ckg4ThkIhl9zEKK3J4pxRk6FmAhW1zNHAOpQ7nWVW8j3oFGXhvgChQmKABkyE4gJzrq1UsTbWMR-P-wiyC3Kx0im2YOSBE2OcNkVxnq4uXtouXhf6y3H3DRzHiyNOvSBf_fLxQt6p32ahdmIKV85nKeNKca68t9iF_UYRlnNyRn3YxeOvb_nUAWyyELrXCZ1D6FTTuTuCD_i3Gs2mPVhXQ92rdfsfWyT_7g
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Apollo%3A+a+sequencing-technology-independent%2C+scalable+and+accurate+assembly+polishing+algorithm&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Firtina%2C+Can&rft.au=Kim%2C+Jeremie+S&rft.au=Alser%2C+Mohammed&rft.au=Senol+Cali%2C+Damla&rft.date=2020-06-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=36&rft.issue=12&rft.spage=3669&rft.epage=3679&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaa179&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btaa179
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon