Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm
Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their...
Saved in:
| Published in: | Bioinformatics (Oxford, England) Vol. 28; no. 13; pp. 1684 - 1691 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
England
Oxford University Press
01.07.2012
|
| Subjects: | |
| ISSN: | 1367-4803, 1367-4811, 1367-4811 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.
Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.
Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.
Contact: ari.loytynoja@helsinki.fi
Supplementary information: Supplementary data are available at Bioinformatics online. |
|---|---|
| AbstractList | Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.
Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.
Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.
Contact: ari.loytynoja@helsinki.fi
Supplementary information: Supplementary data are available at Bioinformatics online. Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.MOTIVATIONAccurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.RESULTSWe have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.AVAILABILITYPAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa. Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa. Contact: ari.loytynoja@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa. |
| Author | Vilella, Albert J. Goldman, Nick Löytynoja, Ari |
| AuthorAffiliation | 1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2 Institute of Biotechnology, 00014 University of Helsinki, Finland |
| AuthorAffiliation_xml | – name: 1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2 Institute of Biotechnology, 00014 University of Helsinki, Finland |
| Author_xml | – sequence: 1 givenname: Ari surname: Löytynoja fullname: Löytynoja, Ari – sequence: 2 givenname: Albert J. surname: Vilella fullname: Vilella, Albert J. – sequence: 3 givenname: Nick surname: Goldman fullname: Goldman, Nick |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/22531217$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9UctO3TAQtSqq8mg_ocjLbgJ2nIevkCohVB4SUjew6caaOJNcI8cOtkN7_75GFxCwYDUjzXnMzNknO847JOQ7Z0ecrcRxZ7xxgw8TJKPjcZciX8lPZI-Lpi0qyfnOS8_ELtmP8Y4xVrO6-UJ2y7IWvOTtHvlzqvUSICHFfwldNN5RP9BpscnMFmnE-wWdRgrWjG5ClyJdonEjBTqvN9aP6DYF_IWAdAwwrzNw9MGk9fSVfB7ARvz2VA_I7fmvm7PL4vr3xdXZ6XWhq7ZNBUqtOXYg66Zh_cBbABBdt-pZhSAaJkXLetbzFqteamSlkNh1TS-lZqWEQRyQn1vdeekm7HXeMYBVczAThI3yYNTbiTNrNfoHJYTkq6bMAj-eBILP18akJhM1WgsO_RIVZ2V-mKzqJkMPX3u9mDw_NANOtgAdfIwBB6VNyhH5R2tjs5Z6jE-9jU9t48vs-h372eBj3n9BYarE |
| CitedBy_id | crossref_primary_10_1017_S0031182015000438 crossref_primary_10_1111_nph_13140 crossref_primary_10_1093_nar_gkt1065 crossref_primary_10_1109_TCBB_2013_2297101 crossref_primary_10_1186_s13059_018_1388_2 crossref_primary_10_3233_JCM_180789 crossref_primary_10_1186_1471_2164_15_S6_S15 crossref_primary_10_1038_srep32372 crossref_primary_10_1186_s12864_016_3101_8 crossref_primary_10_1186_s12862_016_0743_8 crossref_primary_10_1109_ACCESS_2024_3367801 crossref_primary_10_3389_fcvm_2021_805812 crossref_primary_10_1186_s12864_024_10901_2 crossref_primary_10_5507_fot_2015_030 crossref_primary_10_1093_nar_gkt1055 crossref_primary_10_1534_genetics_114_161299 crossref_primary_10_1098_rsos_180903 crossref_primary_10_3389_fbioe_2018_00072 crossref_primary_10_1073_pnas_2310752120 crossref_primary_10_1093_molbev_msae177 crossref_primary_10_1051_e3sconf_202126701040 crossref_primary_10_1007_s13659_025_00524_9 crossref_primary_10_1111_mec_14256 crossref_primary_10_7717_peerj_243 crossref_primary_10_1093_nar_gkx322 crossref_primary_10_1093_molbev_msv333 crossref_primary_10_3390_v13112319 crossref_primary_10_1186_s13059_014_0524_x crossref_primary_10_1016_j_csbj_2020_06_018 crossref_primary_10_1186_s12862_014_0235_7 crossref_primary_10_1016_j_lwt_2022_114000 crossref_primary_10_1007_s11557_020_01571_x crossref_primary_10_1126_science_adn0609 crossref_primary_10_1093_bib_bbx108 crossref_primary_10_1073_pnas_2107005119 crossref_primary_10_1080_19336934_2015_1101196 crossref_primary_10_3390_md18020127 crossref_primary_10_1016_j_cels_2017_06_015 crossref_primary_10_15252_embj_2020106847 crossref_primary_10_1534_g3_120_401312 crossref_primary_10_1073_pnas_1220450110 crossref_primary_10_1371_journal_pcbi_1010633 crossref_primary_10_3389_fimmu_2019_01105 crossref_primary_10_1093_molbev_msz225 crossref_primary_10_1093_ve_veae005 crossref_primary_10_1186_s12862_019_1356_9 crossref_primary_10_3389_fmicb_2019_02531 crossref_primary_10_1111_2041_210X_13696 crossref_primary_10_7554_eLife_30637 crossref_primary_10_1007_s00239_013_9589_5 crossref_primary_10_1093_molbev_msx167 crossref_primary_10_1093_molbev_msx089 crossref_primary_10_1111_1758_2229_12068 crossref_primary_10_3390_biom12040546 crossref_primary_10_1186_1471_2105_15_338 crossref_primary_10_1093_nar_gkt628 crossref_primary_10_1186_s13059_024_03371_y crossref_primary_10_1007_s00239_017_9806_8 crossref_primary_10_1093_sysbio_syy036 crossref_primary_10_1093_molbev_msu141 crossref_primary_10_1038_s41586_021_04341_1 crossref_primary_10_1371_journal_pcbi_1011871 crossref_primary_10_1038_s41588_021_00862_7 crossref_primary_10_1016_j_soilbio_2019_05_020 crossref_primary_10_1002_jsfa_13696 crossref_primary_10_1093_molbev_mst010 |
| Cites_doi | 10.1093/genetics/155.1.431 10.1016/0022-2836(82)90398-9 10.1126/science.1158395 10.1093/nar/gkf436 10.1186/1471-2164-11-461 10.1038/msb.2011.75 10.1093/bioinformatics/btl446 10.1186/gb-2010-11-4-r37 10.1093/molbev/msq115 10.1093/molbev/msp098 10.1093/bioinformatics/btr320 10.1093/bioinformatics/btm404 10.1186/1471-2105-6-31 10.1371/journal.pcbi.1002195 10.1093/molbev/msr272 10.1093/sysbio/syr010 10.1137/0128004 10.1073/pnas.0409137102 10.1101/gr.115949.110 10.1093/bioinformatics/18.3.452 10.1126/science.1175949 10.1186/1471-2105-11-538 10.1101/gr.076521.108 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2012. Published by Oxford University Press. 2012 |
| Copyright_xml | – notice: The Author(s) 2012. Published by Oxford University Press. 2012 |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM |
| DOI | 10.1093/bioinformatics/bts198 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | CrossRef MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1367-4811 |
| EndPage | 1691 |
| ExternalDocumentID | PMC3381962 22531217 10_1093_bioinformatics_bts198 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GrantInformation_xml | – fundername: Wellcome Trust grantid: GR078968 |
| GroupedDBID | --- -E4 -~X .2P .DC .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN AAYXX ABEJV ABEUO ABIXL ABNKS ABPQP ABPTD ABQLI ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC AMNDL APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CDBKE CITATION COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. Q5Y R44 RD5 RNS ROL ROX RPM RUSNO RW1 RXO SV3 TEORI TJP TLC TOX TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~91 ~KM ABGNP ABQTQ CGR CUY CVF ECM EIF M49 NPM 7X8 5PM |
| ID | FETCH-LOGICAL-c477t-e8cc1eba85660df17aaa3bb9d04ea3608370d0d17e4d8ce0238ebb6d88c028af3 |
| ISICitedReferencesCount | 91 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000305825600003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4803 1367-4811 |
| IngestDate | Tue Nov 04 01:41:09 EST 2025 Thu Oct 02 10:55:47 EDT 2025 Thu Apr 03 07:06:56 EDT 2025 Sat Nov 29 05:33:49 EST 2025 Tue Nov 18 21:53:04 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 13 |
| Language | English |
| License | http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c477t-e8cc1eba85660df17aaa3bb9d04ea3608370d0d17e4d8ce0238ebb6d88c028af3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Associate Editor: David Posada |
| OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC3381962 |
| PMID | 22531217 |
| PQID | 1022258456 |
| PQPubID | 23479 |
| PageCount | 8 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_3381962 proquest_miscellaneous_1022258456 pubmed_primary_22531217 crossref_citationtrail_10_1093_bioinformatics_bts198 crossref_primary_10_1093_bioinformatics_bts198 |
| PublicationCentury | 2000 |
| PublicationDate | 2012-07-01 |
| PublicationDateYYYYMMDD | 2012-07-01 |
| PublicationDate_xml | – month: 07 year: 2012 text: 2012-07-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Bioinformatics (Oxford, England) |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2012 |
| Publisher | Oxford University Press |
| Publisher_xml | – name: Oxford University Press |
| References | Yang (2023012512380742100_B27) 2000; 155 Massingham (2023012512380742100_B18) 2012 Slater (2023012512380742100_B24) 2005; 6 Kruskal (2023012512380742100_B11) 1983 Larkin (2023012512380742100_B12) 2007; 23 Stamatakis (2023012512380742100_B25) 2006; 22 Eddy (2023012512380742100_B4) 2011; 7 Mirarab (2023012512380742100_B20) 2012; 17 Gotoh (2023012512380742100_B7) 1982; 162 Löytynoja (2023012512380742100_B15) 2008; 320 Katoh (2023012512380742100_B10) 2002; 30 Fletcher (2023012512380742100_B6) 2010; 27 Lee (2023012512380742100_B13) 2002; 18 Hein (2023012512380742100_B8) 1989; 6 Dessimoz (2023012512380742100_B3) 2010; 11 Fletcher (2023012512380742100_B5) 2009; 26 Berger (2023012512380742100_B1) 2011; 27 Löytynoja (2023012512380742100_B14) 2005; 102 Sankoff (2023012512380742100_B22) 1975; 28 Berger (2023012512380742100_B2) 2011; 60 Markova-Raina (2023012512380742100_B17) 2011; 21 Löytynoja (2023012512380742100_B16) 2009; 324 Jordan (2023012512380742100_B9) 2012; 29 Matsen (2023012512380742100_B19) 2010; 11 Sievers (2023012512380742100_B23) 2011; 7 Paten (2023012512380742100_B21) 2008; 18 Stark (2023012512380742100_B26) 2010; 11 22174280 - Pac Symp Biocomput. 2012;:247-58 11934745 - Bioinformatics. 2002 Mar;18(3):452-64 10790415 - Genetics. 2000 May;155(1):431-49 21393387 - Genome Res. 2011 Jun;21(6):863-74 15713233 - BMC Bioinformatics. 2005;6:31 22049066 - Mol Biol Evol. 2012 Apr;29(4):1125-39 18566285 - Science. 2008 Jun 20;320(5883):1632-5 19541988 - Science. 2009 Jun 19;324(5934):1528-9 20687950 - BMC Genomics. 2010;11:461 12136088 - Nucleic Acids Res. 2002 Jul 15;30(14):3059-66 20447933 - Mol Biol Evol. 2010 Oct;27(10):2257-67 22039361 - PLoS Comput Biol. 2011 Oct;7(10):e1002195 21988835 - Mol Syst Biol. 2011;7:539 2488477 - Mol Biol Evol. 1989 Nov;6(6):649-68 21636595 - Bioinformatics. 2011 Aug 1;27(15):2068-75 19423664 - Mol Biol Evol. 2009 Aug;26(8):1879-88 18849525 - Genome Res. 2008 Nov;18(11):1829-43 16000407 - Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62 20370897 - Genome Biol. 2010;11(4):R37 21436105 - Syst Biol. 2011 May;60(3):291-302 21034504 - BMC Bioinformatics. 2010;11:538 7166760 - J Mol Biol. 1982 Dec 15;162(3):705-8 16928733 - Bioinformatics. 2006 Nov 1;22(21):2688-90 17846036 - Bioinformatics. 2007 Nov 1;23(21):2947-8 |
| References_xml | – volume: 17 start-page: 247 year: 2012 ident: 2023012512380742100_B20 article-title: SEPP: SATé-enabled phylogenetic placement publication-title: Proc. Pac. Symp. Biocomput. – volume: 155 start-page: 431 year: 2000 ident: 2023012512380742100_B27 article-title: Codon-substitution models for heterogeneous selection pressure at amino acid sites publication-title: Genetics doi: 10.1093/genetics/155.1.431 – volume: 162 start-page: 705 year: 1982 ident: 2023012512380742100_B7 article-title: An improved algorithm for matching biological sequences publication-title: J. Mol. Biol. doi: 10.1016/0022-2836(82)90398-9 – volume: 320 start-page: 1632 year: 2008 ident: 2023012512380742100_B15 article-title: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis publication-title: Science doi: 10.1126/science.1158395 – volume: 30 start-page: 3059 year: 2002 ident: 2023012512380742100_B10 article-title: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkf436 – volume: 11 start-page: 461 year: 2010 ident: 2023012512380742100_B26 article-title: MLTreeMap–accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies publication-title: BMC Genomics doi: 10.1186/1471-2164-11-461 – volume: 7 start-page: 539 year: 2011 ident: 2023012512380742100_B23 article-title: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega publication-title: Mol. Syst. Biol. doi: 10.1038/msb.2011.75 – volume: 22 start-page: 2688 year: 2006 ident: 2023012512380742100_B25 article-title: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models publication-title: Bioinformatics doi: 10.1093/bioinformatics/btl446 – volume: 11 start-page: R37 year: 2010 ident: 2023012512380742100_B3 article-title: Phylogenetic assessment of alignments reveals neglected tree signal in gaps publication-title: Genome. Biol. doi: 10.1186/gb-2010-11-4-r37 – volume: 27 start-page: 2257 year: 2010 ident: 2023012512380742100_B6 article-title: The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection publication-title: Mol. Biol. Evol. doi: 10.1093/molbev/msq115 – volume: 6 start-page: 649 year: 1989 ident: 2023012512380742100_B8 article-title: A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given publication-title: Mol. Biol. Evol. – volume: 26 start-page: 1879 year: 2009 ident: 2023012512380742100_B5 article-title: INDELible: a flexible simulator of biological sequence evolution publication-title: Mol. Biol. Evol. doi: 10.1093/molbev/msp098 – year: 2012 ident: 2023012512380742100_B18 publication-title: simNGS and simLibrary – software for simulating next-gen sequencing data. – volume: 27 start-page: 2068 year: 2011 ident: 2023012512380742100_B1 article-title: Aligning short reads to reference alignments and trees publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr320 – volume: 23 start-page: 2947 year: 2007 ident: 2023012512380742100_B12 article-title: Clustal W and Clustal X version 2.0 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm404 – start-page: 265 volume-title: Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison year: 1983 ident: 2023012512380742100_B11 article-title: An anthology of algorithms and concepts for sequence comparison – volume: 6 start-page: 31 year: 2005 ident: 2023012512380742100_B24 article-title: Automated generation of heuristics for biological sequence comparison publication-title: BMC Bioinform. doi: 10.1186/1471-2105-6-31 – volume: 7 start-page: e1002195 year: 2011 ident: 2023012512380742100_B4 article-title: Accelerated Profile HMM Searches publication-title: PLoS Comput. Biol. doi: 10.1371/journal.pcbi.1002195 – volume: 29 start-page: 1125 year: 2012 ident: 2023012512380742100_B9 article-title: The effects of alignment error and alignment filtering on the sitewise detection of positive selection publication-title: Mol. Biol. Evol. doi: 10.1093/molbev/msr272 – volume: 60 start-page: 291 year: 2011 ident: 2023012512380742100_B2 article-title: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood publication-title: Syst. Biol. doi: 10.1093/sysbio/syr010 – volume: 28 start-page: 35 year: 1975 ident: 2023012512380742100_B22 article-title: Minimal mutation trees of sequences publication-title: SIAM J. Appl. Math. doi: 10.1137/0128004 – volume: 102 start-page: 10557 year: 2005 ident: 2023012512380742100_B14 article-title: An algorithm for progressive multiple alignment of sequences with insertions publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.0409137102 – volume: 21 start-page: 863 year: 2011 ident: 2023012512380742100_B17 article-title: High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes publication-title: Genome Res. doi: 10.1101/gr.115949.110 – volume: 18 start-page: 452 year: 2002 ident: 2023012512380742100_B13 article-title: Multiple sequence alignment using partial order graphs publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.3.452 – volume: 324 start-page: 1528 year: 2009 ident: 2023012512380742100_B16 article-title: Uniting alignments and trees publication-title: Science doi: 10.1126/science.1175949 – volume: 11 start-page: 538 year: 2010 ident: 2023012512380742100_B19 article-title: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree publication-title: BMC Bioinform. doi: 10.1186/1471-2105-11-538 – volume: 18 start-page: 1829 year: 2008 ident: 2023012512380742100_B21 article-title: Genome-wide nucleotide-level mammalian ancestor reconstruction publication-title: Genome Res. doi: 10.1101/gr.076521.108 – reference: 20687950 - BMC Genomics. 2010;11:461 – reference: 22174280 - Pac Symp Biocomput. 2012;:247-58 – reference: 22049066 - Mol Biol Evol. 2012 Apr;29(4):1125-39 – reference: 22039361 - PLoS Comput Biol. 2011 Oct;7(10):e1002195 – reference: 19541988 - Science. 2009 Jun 19;324(5934):1528-9 – reference: 19423664 - Mol Biol Evol. 2009 Aug;26(8):1879-88 – reference: 21988835 - Mol Syst Biol. 2011;7:539 – reference: 18566285 - Science. 2008 Jun 20;320(5883):1632-5 – reference: 15713233 - BMC Bioinformatics. 2005;6:31 – reference: 21436105 - Syst Biol. 2011 May;60(3):291-302 – reference: 11934745 - Bioinformatics. 2002 Mar;18(3):452-64 – reference: 18849525 - Genome Res. 2008 Nov;18(11):1829-43 – reference: 7166760 - J Mol Biol. 1982 Dec 15;162(3):705-8 – reference: 20447933 - Mol Biol Evol. 2010 Oct;27(10):2257-67 – reference: 10790415 - Genetics. 2000 May;155(1):431-49 – reference: 16928733 - Bioinformatics. 2006 Nov 1;22(21):2688-90 – reference: 12136088 - Nucleic Acids Res. 2002 Jul 15;30(14):3059-66 – reference: 20370897 - Genome Biol. 2010;11(4):R37 – reference: 21636595 - Bioinformatics. 2011 Aug 1;27(15):2068-75 – reference: 2488477 - Mol Biol Evol. 1989 Nov;6(6):649-68 – reference: 21034504 - BMC Bioinformatics. 2010;11:538 – reference: 21393387 - Genome Res. 2011 Jun;21(6):863-74 – reference: 16000407 - Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62 – reference: 17846036 - Bioinformatics. 2007 Nov 1;23(21):2947-8 |
| SSID | ssj0005056 |
| Score | 2.3915977 |
| Snippet | Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on... Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these... |
| SourceID | pubmedcentral proquest pubmed crossref |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | 1684 |
| SubjectTerms | Algorithms Original Papers Phylogeny Sequence Alignment - methods Sequence Analysis, DNA - methods Sequence Analysis, Protein Software |
| Title | Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/22531217 https://www.proquest.com/docview/1022258456 https://pubmed.ncbi.nlm.nih.gov/PMC3381962 |
| Volume | 28 |
| WOSCitedRecordID | wos000305825600003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1367-4811 dateEnd: 20220930 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELa6C0hcEG_KY2UkblV24ziNneOCQAihwqGgikvk2M5uUElWbbq0P4r_yDi206YgYA9cospORm3m63hmPP4GoReFivIkKajh044D8P_HgShUAY6cLnRK87GSsm02wSYTPpulHweDH_4szOWcVRVfr9OL_6pqGANlm6OzV1B3JxQG4DMoHa6gdrj-k-JPpVwZ_odRm99eOoewKxz0tdMjcMDP3AG3VZswECbNAaZQV5tAfDcVYS2bNdx4Vi_K5vxbbwO4rB3nasvzbEhL175O3jUG2UkyvDf78S-TTbOp6q82k7so_eRnMEwARnvcxtR5j94dd2VB9Vy5HO3E8_e7HAXZ1rN6s0oNuzoPrSnTu2PO1DpbHPFdzNEdy0oS20rOrdKG4-e3K4Blx8p7L8EMNEti2133Obf31sKuQtHuzdOsLyizYg7QtYiNU1NBOP0w21YUhW2z4O7H-gNjKT3pizmxYvqu0C_xzX6Z7o7fM72NbrmABZ9aoN1BA13dRTdsC9PNPfTFww13cMN1gT3csIcb3sINt3DDAu_BDbdwwx3c7qNPb15PX70NXL-OQMaMNYHmUhKdCw4hQqgKwoQQNM9TFcZa0CQ0PEsqVITpWHGpjbeo8zxRnEvwckVBH6DDqq70I4QFeKkskUwVNI2LVAmSp1KIsSSRzImSQxT7V5dJR2ZveqrMsz8qboiOu8cuLJvL3x547vWSgd01m2mi0vVqmdlMCYf4Y4geWj11ImGCEoj1h4j1NNjdYDjd-zNVed5yu1OTQUmix1f9ok_Qze0f7yk6bBYr_Qxdl5dNuVwcoQM240ctWn8C25vUEQ |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accurate+extension+of+multiple+sequence+alignments+using+a+phylogeny-aware+graph+algorithm&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=L%C3%B6ytynoja%2C+Ari&rft.au=Vilella%2C+Albert+J.&rft.au=Goldman%2C+Nick&rft.date=2012-07-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=28&rft.issue=13&rft.spage=1684&rft.epage=1691&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbts198&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_bts198 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |