Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm
Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in downstream genome analysis. Unfortunately, long reads have high sequencing error rates and a large proportion of bps in these long reads are i...
Uloženo v:
| Vydáno v: | arXiv.org |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Paper |
| Jazyk: | angličtina |
| Vydáno: |
Ithaca
Cornell University Library, arXiv.org
07.03.2020
|
| Témata: | |
| ISSN: | 2331-8422 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in downstream genome analysis. Unfortunately, long reads have high sequencing error rates and a large proportion of bps in these long reads are incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e., read-to-assembly alignment information). However, assembly polishing algorithms can only polish an assembly using reads either from a certain sequencing technology or from a small assembly. Such technology-dependency and assembly-size dependency require researchers to 1) run multiple polishing algorithms and 2) use small chunks of a large genome to use all available read sets and polish large genomes. We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e., both large and small genomes) using reads from all sequencing technologies (i.e., second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo 1) models an assembly as a profile hidden Markov model (pHMM), 2) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm, and 3) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real read sets demonstrate that Apollo is the only algorithm that 1) uses reads from any sequencing technology within a single run and 2) scales well to polish large assemblies without splitting the assembly into multiple parts. |
|---|---|
| AbstractList | Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in downstream genome analysis. Unfortunately, long reads have high sequencing error rates and a large proportion of bps in these long reads are incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e., read-to-assembly alignment information). However, assembly polishing algorithms can only polish an assembly using reads either from a certain sequencing technology or from a small assembly. Such technology-dependency and assembly-size dependency require researchers to 1) run multiple polishing algorithms and 2) use small chunks of a large genome to use all available read sets and polish large genomes. We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e., both large and small genomes) using reads from all sequencing technologies (i.e., second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo 1) models an assembly as a profile hidden Markov model (pHMM), 2) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm, and 3) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real read sets demonstrate that Apollo is the only algorithm that 1) uses reads from any sequencing technology within a single run and 2) scales well to polish large assemblies without splitting the assembly into multiple parts. |
| Author | Alkan, Can Firtina, Can Cali, Damla Senol Cicek, A Ercument Mutlu, Onur Kim, Jeremie S Alser, Mohammed |
| Author_xml | – sequence: 1 givenname: Can surname: Firtina fullname: Firtina, Can – sequence: 2 givenname: Jeremie surname: Kim middlename: S fullname: Kim, Jeremie S – sequence: 3 givenname: Mohammed surname: Alser fullname: Alser, Mohammed – sequence: 4 givenname: Damla surname: Cali middlename: Senol fullname: Cali, Damla Senol – sequence: 5 givenname: A surname: Cicek middlename: Ercument fullname: Cicek, A Ercument – sequence: 6 givenname: Can surname: Alkan fullname: Alkan, Can – sequence: 7 givenname: Onur surname: Mutlu fullname: Mutlu, Onur |
| BookMark | eNotj11LwzAYhYMoOOd-gHcBb9eZzzbxrgyng4HCej-T9G3XkSWz6cT9ewt6c56LA8_h3KHrEAMg9EDJQigpyZPpf7rvBdWELYjggl6hCeOcZkowdotmKR0IISwvmJR8gj7LU_Q-PuMSb-HrDMF1oc0qcPsQfWwv2TrUcIIxwjDHW2e8sR7m2IQal86dezMALlOCo_UX_BF9l_ajAZe-jX037I_36KYxPsHsn1NUrV6q5Vu2eX9dL8tNZiSjGS94DdY6yAWhUDRKaitFUTtjFXXCEa1Yw6Hh1AKppeZKy5orkjfUjr3hU_T4pz31cbyRht0hnvswLu4YLTRVhZaU_wLR91gO |
| ContentType | Paper |
| Copyright | 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.48550/arxiv.1902.04341 |
| DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology collection ProQuest One Community College ProQuest Central SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest One Academic ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database (ProQuest) url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 2331-8422 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| ID | FETCH-LOGICAL-a521-373debbce6401e7f859b547dcab81c4c0982f3ef31be0d593895d3806f1b1c4a3 |
| IEDL.DBID | M7S |
| IngestDate | Mon Jun 30 09:13:46 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a521-373debbce6401e7f859b547dcab81c4c0982f3ef31be0d593895d3806f1b1c4a3 |
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
| OpenAccessLink | https://www.proquest.com/docview/2179187951?pq-origsite=%requestingapplication% |
| PQID | 2179187951 |
| PQPubID | 2050157 |
| ParticipantIDs | proquest_journals_2179187951 |
| PublicationCentury | 2000 |
| PublicationDate | 20200307 |
| PublicationDateYYYYMMDD | 2020-03-07 |
| PublicationDate_xml | – month: 03 year: 2020 text: 20200307 day: 07 |
| PublicationDecade | 2020 |
| PublicationPlace | Ithaca |
| PublicationPlace_xml | – name: Ithaca |
| PublicationTitle | arXiv.org |
| PublicationYear | 2020 |
| Publisher | Cornell University Library, arXiv.org |
| Publisher_xml | – name: Cornell University Library, arXiv.org |
| SSID | ssj0002672553 |
| Score | 1.7164248 |
| SecondaryResourceType | preprint |
| Snippet | Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Algorithms Alignment Assemblies Assembly Dependence Error analysis Gene sequencing Genomes Markov chains Polishes Polishing State of the art Viterbi algorithm detectors |
| Title | Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm |
| URI | https://www.proquest.com/docview/2179187951 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JTwIxFG4UNPHkHhckPXikMNN2Ni9mNBBJlEyEGDxht1ESNmcGov_edhjgYOLFY9NL05e-5Xtf3wfANbaUg11io1hIjqjLY-QHWCDmUkfainkyh7JfHr1Ox-_3g6gA3NKCVrnyibmjllNhMPIGNnM0jTK2fTv7REY1ynRXCwmNbVA2UxJwTt3rrjEW7Ho6YybLZmY-uqvBkq_hoq6jIK5blFD7lwvO40pr_78nOgDliM1Ucgi21OQI7OZ8TpEeg7dwpg08vYEh7C7J0jpEoQ2Ojtpr-dusBrvaUOYLVQ2yiYShEHMzQAKahvCYj76hIcnlUBUMR-_6CNnH-AT0Ws3e_QMqxBQQ0xFa-xEiFedCubqgUl7sOwF3qCcF474tqLACH8dExcTmypJOoPMYRxLfcmOb631GTkFpMp2oMwClmSQqCfO5wNTFUldssc4rY0tgTqnDzkFldV-D4kGkg81lXfy9fQn2sClpDc3Lq4BSlszVFdgRi2yYJlVQvmt2oudqbme9itpP0esPJ2q0DA |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LTwIxEJ4gaPTkO77tQW8Udtvuy8QYohIJSEgghhv2tWoiD1l8_Sj_o-0CcjDxxsFzk026M52vM_N1PoAT4miP-NTFsVQCM1_EOIyIxNxnnnI1D1Rayr6rBfV62G5HjQx8Td_CWFrlNCamgVr1pa2RF4mdo2mVsd2LwQu2qlG2uzqV0Bi7RVV_vpuULTmvXBn7nhJSvm5d3uCJqgDmBqrMgaJKCyG1bzILHcShFwmPBUpyEbqSSScKSUx1TF2hHeVFBtA9RUPHj11h1jk1n12AHLPBP2UKNn9KOsQPzAWdjnun6aSwIh9-PL0VDOiSgsMoc39F_BTGyqv_7AesQa7BB3q4Dhnd24CllK0qk024Lw2M-_bPUAk1x1RwA8B41iXAlR9x31EeNY0b2gdiecR7CpWkfLXjMZBtd3fF8yeyFMC0EIdKzw9mx6PH7ha05rGnbcj2-j29A0jZOamK8lBIwnyiTD4am1tz7EgiGPP4LhxMzdOZHPekM7PN3t_Lx7B807qtdWqVenUfVohN3i2hLTiA7Gj4qg9hUb6NnpLhUepaCDpztuQ3MDwNIQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Apollo%3A+A+Sequencing-Technology-Independent%2C+Scalable%2C+and+Accurate+Assembly+Polishing+Algorithm&rft.jtitle=arXiv.org&rft.au=Firtina%2C+Can&rft.au=Kim%2C+Jeremie+S&rft.au=Alser%2C+Mohammed&rft.au=Cali%2C+Damla+Senol&rft.date=2020-03-07&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.1902.04341 |