Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm

Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Bioinformatics (Oxford, England) Ročník 28; číslo 13; s. 1684 - 1691
Hlavní autoři: Löytynoja, Ari, Vilella, Albert J., Goldman, Nick
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 01.07.2012
Témata:
ISSN:1367-4803, 1367-4811, 1367-4811
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa. Contact:  ari.loytynoja@helsinki.fi Supplementary information:  Supplementary data are available at Bioinformatics online.
AbstractList Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa. Contact:  ari.loytynoja@helsinki.fi Supplementary information:  Supplementary data are available at Bioinformatics online.
Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.MOTIVATIONAccurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences.We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.RESULTSWe have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses.PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.AVAILABILITYPAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.
Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. Results: We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. Availability: PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa. Contact: ari.loytynoja@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online.
Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these alignments. With the abundance of sequence data, an integrative approach of adding new sequences to existing alignments without their full re-computation and maintaining the relative matching of existing sequences is an attractive option. Another current challenge is the extension of reference alignments with fragmented sequences, as those coming from next-generation metagenomics, that contain relatively little information. Widely used methods for alignment extension are based on profile representation of reference sequences. These do not incorporate and use phylogenetic information and are affected by the composition of the reference alignment and the phylogenetic positions of query sequences. We have developed a method for phylogeny-aware alignment of partial-order sequence graphs and apply it here to the extension of alignments with new data. Our new method, called PAGAN, infers ancestral sequences for the reference alignment and adds new sequences in their phylogenetic context, either to predefined positions or by finding the best placement for sequences of unknown origin. Unlike profile-based alternatives, PAGAN considers the phylogenetic relatedness of the sequences and is not affected by inclusion of more diverged sequences in the reference set. Our analyses show that PAGAN outperforms alternative methods for alignment extension and provides superior accuracy for both DNA and protein data, the improvement being especially large for fragmented sequences. Moreover, PAGAN-generated alignments of noisy next-generation sequencing (NGS) sequences are accurate enough for the use of RNA-seq data in evolutionary analyses. PAGAN is written in C++, licensed under the GPL and its source code is available at http://code.google.com/p/pagan-msa.
Author Vilella, Albert J.
Goldman, Nick
Löytynoja, Ari
AuthorAffiliation 1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2 Institute of Biotechnology, 00014 University of Helsinki, Finland
AuthorAffiliation_xml – name: 1 EMBL-European Bioinformatics Institute, Hinxton, CB10 1SD, UK and 2 Institute of Biotechnology, 00014 University of Helsinki, Finland
Author_xml – sequence: 1
  givenname: Ari
  surname: Löytynoja
  fullname: Löytynoja, Ari
– sequence: 2
  givenname: Albert J.
  surname: Vilella
  fullname: Vilella, Albert J.
– sequence: 3
  givenname: Nick
  surname: Goldman
  fullname: Goldman, Nick
BackLink https://www.ncbi.nlm.nih.gov/pubmed/22531217$$D View this record in MEDLINE/PubMed
BookMark eNp9UctO3TAQtSqq8mg_ocjLbgJ2nIevkCohVB4SUjew6caaOJNcI8cOtkN7_75GFxCwYDUjzXnMzNknO847JOQ7Z0ecrcRxZ7xxgw8TJKPjcZciX8lPZI-Lpi0qyfnOS8_ELtmP8Y4xVrO6-UJ2y7IWvOTtHvlzqvUSICHFfwldNN5RP9BpscnMFmnE-wWdRgrWjG5ClyJdonEjBTqvN9aP6DYF_IWAdAwwrzNw9MGk9fSVfB7ARvz2VA_I7fmvm7PL4vr3xdXZ6XWhq7ZNBUqtOXYg66Zh_cBbABBdt-pZhSAaJkXLetbzFqteamSlkNh1TS-lZqWEQRyQn1vdeekm7HXeMYBVczAThI3yYNTbiTNrNfoHJYTkq6bMAj-eBILP18akJhM1WgsO_RIVZ2V-mKzqJkMPX3u9mDw_NANOtgAdfIwBB6VNyhH5R2tjs5Z6jE-9jU9t48vs-h372eBj3n9BYarE
CitedBy_id crossref_primary_10_1017_S0031182015000438
crossref_primary_10_1111_nph_13140
crossref_primary_10_1093_nar_gkt1065
crossref_primary_10_1109_TCBB_2013_2297101
crossref_primary_10_1186_s13059_018_1388_2
crossref_primary_10_3233_JCM_180789
crossref_primary_10_1186_1471_2164_15_S6_S15
crossref_primary_10_1038_srep32372
crossref_primary_10_1186_s12864_016_3101_8
crossref_primary_10_1186_s12862_016_0743_8
crossref_primary_10_1109_ACCESS_2024_3367801
crossref_primary_10_3389_fcvm_2021_805812
crossref_primary_10_1186_s12864_024_10901_2
crossref_primary_10_5507_fot_2015_030
crossref_primary_10_1093_nar_gkt1055
crossref_primary_10_1534_genetics_114_161299
crossref_primary_10_1098_rsos_180903
crossref_primary_10_3389_fbioe_2018_00072
crossref_primary_10_1073_pnas_2310752120
crossref_primary_10_1093_molbev_msae177
crossref_primary_10_1051_e3sconf_202126701040
crossref_primary_10_1007_s13659_025_00524_9
crossref_primary_10_1111_mec_14256
crossref_primary_10_7717_peerj_243
crossref_primary_10_1093_nar_gkx322
crossref_primary_10_1093_molbev_msv333
crossref_primary_10_3390_v13112319
crossref_primary_10_1186_s13059_014_0524_x
crossref_primary_10_1016_j_csbj_2020_06_018
crossref_primary_10_1186_s12862_014_0235_7
crossref_primary_10_1016_j_lwt_2022_114000
crossref_primary_10_1007_s11557_020_01571_x
crossref_primary_10_1126_science_adn0609
crossref_primary_10_1093_bib_bbx108
crossref_primary_10_1073_pnas_2107005119
crossref_primary_10_1080_19336934_2015_1101196
crossref_primary_10_3390_md18020127
crossref_primary_10_1016_j_cels_2017_06_015
crossref_primary_10_15252_embj_2020106847
crossref_primary_10_1534_g3_120_401312
crossref_primary_10_1073_pnas_1220450110
crossref_primary_10_1371_journal_pcbi_1010633
crossref_primary_10_3389_fimmu_2019_01105
crossref_primary_10_1093_molbev_msz225
crossref_primary_10_1093_ve_veae005
crossref_primary_10_1186_s12862_019_1356_9
crossref_primary_10_3389_fmicb_2019_02531
crossref_primary_10_1111_2041_210X_13696
crossref_primary_10_7554_eLife_30637
crossref_primary_10_1007_s00239_013_9589_5
crossref_primary_10_1093_molbev_msx167
crossref_primary_10_1093_molbev_msx089
crossref_primary_10_1111_1758_2229_12068
crossref_primary_10_3390_biom12040546
crossref_primary_10_1186_1471_2105_15_338
crossref_primary_10_1093_nar_gkt628
crossref_primary_10_1186_s13059_024_03371_y
crossref_primary_10_1007_s00239_017_9806_8
crossref_primary_10_1093_sysbio_syy036
crossref_primary_10_1093_molbev_msu141
crossref_primary_10_1038_s41586_021_04341_1
crossref_primary_10_1371_journal_pcbi_1011871
crossref_primary_10_1038_s41588_021_00862_7
crossref_primary_10_1016_j_soilbio_2019_05_020
crossref_primary_10_1002_jsfa_13696
crossref_primary_10_1093_molbev_mst010
Cites_doi 10.1093/genetics/155.1.431
10.1016/0022-2836(82)90398-9
10.1126/science.1158395
10.1093/nar/gkf436
10.1186/1471-2164-11-461
10.1038/msb.2011.75
10.1093/bioinformatics/btl446
10.1186/gb-2010-11-4-r37
10.1093/molbev/msq115
10.1093/molbev/msp098
10.1093/bioinformatics/btr320
10.1093/bioinformatics/btm404
10.1186/1471-2105-6-31
10.1371/journal.pcbi.1002195
10.1093/molbev/msr272
10.1093/sysbio/syr010
10.1137/0128004
10.1073/pnas.0409137102
10.1101/gr.115949.110
10.1093/bioinformatics/18.3.452
10.1126/science.1175949
10.1186/1471-2105-11-538
10.1101/gr.076521.108
ContentType Journal Article
Copyright The Author(s) 2012. Published by Oxford University Press. 2012
Copyright_xml – notice: The Author(s) 2012. Published by Oxford University Press. 2012
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOI 10.1093/bioinformatics/bts198
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList CrossRef
MEDLINE - Academic

MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
EndPage 1691
ExternalDocumentID PMC3381962
22531217
10_1093_bioinformatics_bts198
Genre Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: Wellcome Trust
  grantid: GR078968
GroupedDBID ---
-E4
-~X
.2P
.DC
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
AAYXX
ABEJV
ABEUO
ABIXL
ABNKS
ABPQP
ABPTD
ABQLI
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CDBKE
CITATION
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RNS
ROL
ROX
RPM
RUSNO
RW1
RXO
SV3
TEORI
TJP
TLC
TOX
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~91
~KM
ABGNP
ABQTQ
CGR
CUY
CVF
ECM
EIF
M49
NPM
7X8
5PM
ID FETCH-LOGICAL-c477t-e8cc1eba85660df17aaa3bb9d04ea3608370d0d17e4d8ce0238ebb6d88c028af3
ISICitedReferencesCount 91
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000305825600003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Tue Nov 04 01:41:09 EST 2025
Thu Oct 02 10:55:47 EDT 2025
Thu Apr 03 07:06:56 EDT 2025
Sat Nov 29 05:33:49 EST 2025
Tue Nov 18 21:53:04 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 13
Language English
License http://creativecommons.org/licenses/by-nc/3.0
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c477t-e8cc1eba85660df17aaa3bb9d04ea3608370d0d17e4d8ce0238ebb6d88c028af3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Associate Editor: David Posada
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC3381962
PMID 22531217
PQID 1022258456
PQPubID 23479
PageCount 8
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_3381962
proquest_miscellaneous_1022258456
pubmed_primary_22531217
crossref_citationtrail_10_1093_bioinformatics_bts198
crossref_primary_10_1093_bioinformatics_bts198
PublicationCentury 2000
PublicationDate 2012-07-01
PublicationDateYYYYMMDD 2012-07-01
PublicationDate_xml – month: 07
  year: 2012
  text: 2012-07-01
  day: 01
PublicationDecade 2010
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2012
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Yang (2023012512380742100_B27) 2000; 155
Massingham (2023012512380742100_B18) 2012
Slater (2023012512380742100_B24) 2005; 6
Kruskal (2023012512380742100_B11) 1983
Larkin (2023012512380742100_B12) 2007; 23
Stamatakis (2023012512380742100_B25) 2006; 22
Eddy (2023012512380742100_B4) 2011; 7
Mirarab (2023012512380742100_B20) 2012; 17
Gotoh (2023012512380742100_B7) 1982; 162
Löytynoja (2023012512380742100_B15) 2008; 320
Katoh (2023012512380742100_B10) 2002; 30
Fletcher (2023012512380742100_B6) 2010; 27
Lee (2023012512380742100_B13) 2002; 18
Hein (2023012512380742100_B8) 1989; 6
Dessimoz (2023012512380742100_B3) 2010; 11
Fletcher (2023012512380742100_B5) 2009; 26
Berger (2023012512380742100_B1) 2011; 27
Löytynoja (2023012512380742100_B14) 2005; 102
Sankoff (2023012512380742100_B22) 1975; 28
Berger (2023012512380742100_B2) 2011; 60
Markova-Raina (2023012512380742100_B17) 2011; 21
Löytynoja (2023012512380742100_B16) 2009; 324
Jordan (2023012512380742100_B9) 2012; 29
Matsen (2023012512380742100_B19) 2010; 11
Sievers (2023012512380742100_B23) 2011; 7
Paten (2023012512380742100_B21) 2008; 18
Stark (2023012512380742100_B26) 2010; 11
22174280 - Pac Symp Biocomput. 2012;:247-58
11934745 - Bioinformatics. 2002 Mar;18(3):452-64
10790415 - Genetics. 2000 May;155(1):431-49
21393387 - Genome Res. 2011 Jun;21(6):863-74
15713233 - BMC Bioinformatics. 2005;6:31
22049066 - Mol Biol Evol. 2012 Apr;29(4):1125-39
18566285 - Science. 2008 Jun 20;320(5883):1632-5
19541988 - Science. 2009 Jun 19;324(5934):1528-9
20687950 - BMC Genomics. 2010;11:461
12136088 - Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
20447933 - Mol Biol Evol. 2010 Oct;27(10):2257-67
22039361 - PLoS Comput Biol. 2011 Oct;7(10):e1002195
21988835 - Mol Syst Biol. 2011;7:539
2488477 - Mol Biol Evol. 1989 Nov;6(6):649-68
21636595 - Bioinformatics. 2011 Aug 1;27(15):2068-75
19423664 - Mol Biol Evol. 2009 Aug;26(8):1879-88
18849525 - Genome Res. 2008 Nov;18(11):1829-43
16000407 - Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62
20370897 - Genome Biol. 2010;11(4):R37
21436105 - Syst Biol. 2011 May;60(3):291-302
21034504 - BMC Bioinformatics. 2010;11:538
7166760 - J Mol Biol. 1982 Dec 15;162(3):705-8
16928733 - Bioinformatics. 2006 Nov 1;22(21):2688-90
17846036 - Bioinformatics. 2007 Nov 1;23(21):2947-8
References_xml – volume: 17
  start-page: 247
  year: 2012
  ident: 2023012512380742100_B20
  article-title: SEPP: SATé-enabled phylogenetic placement
  publication-title: Proc. Pac. Symp. Biocomput.
– volume: 155
  start-page: 431
  year: 2000
  ident: 2023012512380742100_B27
  article-title: Codon-substitution models for heterogeneous selection pressure at amino acid sites
  publication-title: Genetics
  doi: 10.1093/genetics/155.1.431
– volume: 162
  start-page: 705
  year: 1982
  ident: 2023012512380742100_B7
  article-title: An improved algorithm for matching biological sequences
  publication-title: J. Mol. Biol.
  doi: 10.1016/0022-2836(82)90398-9
– volume: 320
  start-page: 1632
  year: 2008
  ident: 2023012512380742100_B15
  article-title: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis
  publication-title: Science
  doi: 10.1126/science.1158395
– volume: 30
  start-page: 3059
  year: 2002
  ident: 2023012512380742100_B10
  article-title: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkf436
– volume: 11
  start-page: 461
  year: 2010
  ident: 2023012512380742100_B26
  article-title: MLTreeMap–accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies
  publication-title: BMC Genomics
  doi: 10.1186/1471-2164-11-461
– volume: 7
  start-page: 539
  year: 2011
  ident: 2023012512380742100_B23
  article-title: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega
  publication-title: Mol. Syst. Biol.
  doi: 10.1038/msb.2011.75
– volume: 22
  start-page: 2688
  year: 2006
  ident: 2023012512380742100_B25
  article-title: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btl446
– volume: 11
  start-page: R37
  year: 2010
  ident: 2023012512380742100_B3
  article-title: Phylogenetic assessment of alignments reveals neglected tree signal in gaps
  publication-title: Genome. Biol.
  doi: 10.1186/gb-2010-11-4-r37
– volume: 27
  start-page: 2257
  year: 2010
  ident: 2023012512380742100_B6
  article-title: The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection
  publication-title: Mol. Biol. Evol.
  doi: 10.1093/molbev/msq115
– volume: 6
  start-page: 649
  year: 1989
  ident: 2023012512380742100_B8
  article-title: A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given
  publication-title: Mol. Biol. Evol.
– volume: 26
  start-page: 1879
  year: 2009
  ident: 2023012512380742100_B5
  article-title: INDELible: a flexible simulator of biological sequence evolution
  publication-title: Mol. Biol. Evol.
  doi: 10.1093/molbev/msp098
– year: 2012
  ident: 2023012512380742100_B18
  publication-title: simNGS and simLibrary – software for simulating next-gen sequencing data.
– volume: 27
  start-page: 2068
  year: 2011
  ident: 2023012512380742100_B1
  article-title: Aligning short reads to reference alignments and trees
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr320
– volume: 23
  start-page: 2947
  year: 2007
  ident: 2023012512380742100_B12
  article-title: Clustal W and Clustal X version 2.0
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm404
– start-page: 265
  volume-title: Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison
  year: 1983
  ident: 2023012512380742100_B11
  article-title: An anthology of algorithms and concepts for sequence comparison
– volume: 6
  start-page: 31
  year: 2005
  ident: 2023012512380742100_B24
  article-title: Automated generation of heuristics for biological sequence comparison
  publication-title: BMC Bioinform.
  doi: 10.1186/1471-2105-6-31
– volume: 7
  start-page: e1002195
  year: 2011
  ident: 2023012512380742100_B4
  article-title: Accelerated Profile HMM Searches
  publication-title: PLoS Comput. Biol.
  doi: 10.1371/journal.pcbi.1002195
– volume: 29
  start-page: 1125
  year: 2012
  ident: 2023012512380742100_B9
  article-title: The effects of alignment error and alignment filtering on the sitewise detection of positive selection
  publication-title: Mol. Biol. Evol.
  doi: 10.1093/molbev/msr272
– volume: 60
  start-page: 291
  year: 2011
  ident: 2023012512380742100_B2
  article-title: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood
  publication-title: Syst. Biol.
  doi: 10.1093/sysbio/syr010
– volume: 28
  start-page: 35
  year: 1975
  ident: 2023012512380742100_B22
  article-title: Minimal mutation trees of sequences
  publication-title: SIAM J. Appl. Math.
  doi: 10.1137/0128004
– volume: 102
  start-page: 10557
  year: 2005
  ident: 2023012512380742100_B14
  article-title: An algorithm for progressive multiple alignment of sequences with insertions
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.0409137102
– volume: 21
  start-page: 863
  year: 2011
  ident: 2023012512380742100_B17
  article-title: High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes
  publication-title: Genome Res.
  doi: 10.1101/gr.115949.110
– volume: 18
  start-page: 452
  year: 2002
  ident: 2023012512380742100_B13
  article-title: Multiple sequence alignment using partial order graphs
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/18.3.452
– volume: 324
  start-page: 1528
  year: 2009
  ident: 2023012512380742100_B16
  article-title: Uniting alignments and trees
  publication-title: Science
  doi: 10.1126/science.1175949
– volume: 11
  start-page: 538
  year: 2010
  ident: 2023012512380742100_B19
  article-title: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
  publication-title: BMC Bioinform.
  doi: 10.1186/1471-2105-11-538
– volume: 18
  start-page: 1829
  year: 2008
  ident: 2023012512380742100_B21
  article-title: Genome-wide nucleotide-level mammalian ancestor reconstruction
  publication-title: Genome Res.
  doi: 10.1101/gr.076521.108
– reference: 20687950 - BMC Genomics. 2010;11:461
– reference: 22174280 - Pac Symp Biocomput. 2012;:247-58
– reference: 22049066 - Mol Biol Evol. 2012 Apr;29(4):1125-39
– reference: 22039361 - PLoS Comput Biol. 2011 Oct;7(10):e1002195
– reference: 19541988 - Science. 2009 Jun 19;324(5934):1528-9
– reference: 19423664 - Mol Biol Evol. 2009 Aug;26(8):1879-88
– reference: 21988835 - Mol Syst Biol. 2011;7:539
– reference: 18566285 - Science. 2008 Jun 20;320(5883):1632-5
– reference: 15713233 - BMC Bioinformatics. 2005;6:31
– reference: 21436105 - Syst Biol. 2011 May;60(3):291-302
– reference: 11934745 - Bioinformatics. 2002 Mar;18(3):452-64
– reference: 18849525 - Genome Res. 2008 Nov;18(11):1829-43
– reference: 7166760 - J Mol Biol. 1982 Dec 15;162(3):705-8
– reference: 20447933 - Mol Biol Evol. 2010 Oct;27(10):2257-67
– reference: 10790415 - Genetics. 2000 May;155(1):431-49
– reference: 16928733 - Bioinformatics. 2006 Nov 1;22(21):2688-90
– reference: 12136088 - Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
– reference: 20370897 - Genome Biol. 2010;11(4):R37
– reference: 21636595 - Bioinformatics. 2011 Aug 1;27(15):2068-75
– reference: 2488477 - Mol Biol Evol. 1989 Nov;6(6):649-68
– reference: 21034504 - BMC Bioinformatics. 2010;11:538
– reference: 21393387 - Genome Res. 2011 Jun;21(6):863-74
– reference: 16000407 - Proc Natl Acad Sci U S A. 2005 Jul 26;102(30):10557-62
– reference: 17846036 - Bioinformatics. 2007 Nov 1;23(21):2947-8
SSID ssj0005056
Score 2.3915977
Snippet Motivation: Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on...
Accurate alignment of large numbers of sequences is demanding and the computational burden is further increased by downstream analyses depending on these...
SourceID pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage 1684
SubjectTerms Algorithms
Original Papers
Phylogeny
Sequence Alignment - methods
Sequence Analysis, DNA - methods
Sequence Analysis, Protein
Software
Title Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm
URI https://www.ncbi.nlm.nih.gov/pubmed/22531217
https://www.proquest.com/docview/1022258456
https://pubmed.ncbi.nlm.nih.gov/PMC3381962
Volume 28
WOSCitedRecordID wos000305825600003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6lBaReEG_Co1okxKUy9Sve9bGgFg5RyiFFERdrX2mNgp0mTkl-AX-bWe-uHbdIlAMXy9q1d5Odz6PZmdlvEHrrSxoSLojHYh56sRqEXsp96hFGpqFOBGSylvSQjEZ0Mkm_9Hq_3FmYqxkpCrpep_P_KmpoA2Hro7P_IO5mUGiAexA6XEHscL2V4I-EWGn-h4Pav720BmGTOOhypw_AAD-3B9xWtcOAaTcHqEJVbDz2U2eE1WzW8OB5ucirix-dAHBeWs7VmudZk5auXZ68LQyy5WQY6nj8h2RTbYryu_HkLnLX-RUUE4DRHLfRed5tqOpTOZPWRTty9P3WRRG06axOq0aaXJ1arar-0GZVcUi3IRdtKdYgMZXkbmh8w4bFO39aN1TLwJS37nJsj06zk7PhMBsfT8bv5peeLj-mw_S2FssOuhOSQapzA8enkzZXyK_LADe_2R0FS6PD7syHZt6ukXNj53I9AXfLohk_QPftVgQfGQg9RD1VPEL3THHSzWP0zQEJN0DC5RQ7IGEHJNwCCddAwgxfAxKugYQbID1BZyfH44-fPVuJwxMxIZWnqBCB4oyC8e_LaUAYYxHnqfRjxaLE1wxK0pcBUbGkQmk7UHGeSEoF2K9sGj1Fu0VZqOcIpzKAzz9JVcp4rOCGp1GiBiwUjEsYvo9it3SZsDT1ulrKLDPpElHWXfHMrHgfvW9emxuelr-98MbJJQONqsNkrFDlapkZHwiFnUUfPTNyaoaEjiiAXXwfkY4Emwc0W3u3p8gvatb2SPtGkvDFLeZ9ifbaz-gV2q0WK_Ua3RVXVb5c7KMdMqH7NUB_AymZwUs
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accurate+extension+of+multiple+sequence+alignments+using+a+phylogeny-aware+graph+algorithm&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=L%C3%B6ytynoja%2C+Ari&rft.au=Vilella%2C+Albert+J&rft.au=Goldman%2C+Nick&rft.date=2012-07-01&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=28&rft.issue=13&rft.spage=1684&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbts198&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon