Sequoya: multiobjective multiple sequence alignment in Python

Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard compl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics Jg. 36; H. 12; S. 3892 - 3893
Hauptverfasser: Benítez-Hidalgo, Antonio, Nebro, Antonio J, Aldana-Montes, José F
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Oxford University Press 01.06.2020
Schlagworte:
ISSN:1367-4803, 1367-4811, 1460-2059, 1367-4811
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. Results The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Availability and implementation Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. Results The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Availability and implementation Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary information Supplementary data are available at Bioinformatics online.
Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary data are available at Bioinformatics online.
Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.MOTIVATIONMultiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.RESULTSThe software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.AVAILABILITY AND IMPLEMENTATIONSequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Author Benítez-Hidalgo, Antonio
Nebro, Antonio J
Aldana-Montes, José F
Author_xml – sequence: 1
  givenname: Antonio
  surname: Benítez-Hidalgo
  fullname: Benítez-Hidalgo, Antonio
  email: antonio.benitez@lcc.uma.es
  organization: Departamento de Lenguajes y Ciencias de la Computación
– sequence: 2
  givenname: Antonio J
  orcidid: 0000-0001-5580-0484
  surname: Nebro
  fullname: Nebro, Antonio J
  organization: Departamento de Lenguajes y Ciencias de la Computación
– sequence: 3
  givenname: José F
  surname: Aldana-Montes
  fullname: Aldana-Montes, José F
  organization: Departamento de Lenguajes y Ciencias de la Computación
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32315391$$D View this record in MEDLINE/PubMed
BookMark eNqNkE1LxDAQhoMofv-FpUcv1aSTtBvRg4hfICio55BkJ5qlTdYmFfbfW9n1oBc9TQaeZybz7pHNEAMSMmH0mFEJJ8ZHH1zsO529TScma12JZoPsMl7TsqJCbo5vqJuSTynskL2U5pQKxjnfJjtQARMg2S45f8L3IS71adENbfbRzNFm_4GrdtFikUYAg8VCt_41dBhy4UPxuMxvMRyQLafbhIfruk9erq-eL2_L-4ebu8uL-9KCgFwCUGlQOAHQGIHAOatnXOraNA5n0joqKTPUuKlzbsYRJLXVeI7jFQozZbBPjlZzF30cf5Oy6nyy2LY6YBySqkBCLRpZT0d0skYH0-FMLXrf6X6pvk8egbMVYPuYUo9OWZ_HFGPIvfatYlR9Jax-JqzWCY96_Uv_3vCnyFZiHBb_dT4BqeaaNw
CitedBy_id crossref_primary_10_1007_s11227_022_04697_9
crossref_primary_10_1145_3763229
crossref_primary_10_1007_s00425_024_04420_3
crossref_primary_10_1016_j_compbiolchem_2022_107661
crossref_primary_10_3390_mi14081577
Cites_doi 10.1109/TCBB.2007.070203
10.1002/9781119273769
10.1089/cmb.1994.1.337
10.1109/4235.996017
10.1093/bioinformatics/btt360
10.1016/j.swevo.2019.100598
10.1145/937503.937505
10.1093/bioinformatics/btx338
10.1016/0022-2836(86)90252-4
ContentType Journal Article
Copyright The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020
The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Copyright_xml – notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020
– notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1093/bioinformatics/btaa257
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1460-2059
1367-4811
EndPage 3893
ExternalDocumentID 32315391
10_1093_bioinformatics_btaa257
10.1093/bioinformatics/btaa257
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID -~X
.2P
5GY
AAMVS
ABPTD
ACGFS
ADZXQ
ALMA_UNASSIGNED_HOLDINGS
BCRHZ
F5P
HW0
KOP
Q5Y
RD5
ROX
TLC
TN5
TOX
WH7
---
-E4
.DC
.I3
0R~
23N
2WC
4.4
48X
53G
5WA
70D
AAIJN
AAIMJ
AAJKP
AAKPC
AAMDB
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
AAYXX
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPQP
ABQLI
ABWST
ABXVV
ABZBJ
ACIWK
ACPRK
ACUFI
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C45
CDBKE
CITATION
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EMOBN
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HZ~
IOX
J21
JXSIZ
KAQDR
KQ8
KSI
KSN
M-Z
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NU-
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
R44
RNS
ROL
RPM
RUSNO
RW1
RXO
SV3
TEORI
TJP
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~91
~KM
ADRIX
AFXEN
CGR
CUY
CVF
ECM
EIF
M49
NPM
7X8
ID FETCH-LOGICAL-c353t-3309be5f5337b5e34416d49a6b7fed9cf0901b0bf8fffd4e390c2257f42e5b813
IEDL.DBID TOX
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550127500041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Fri Jul 11 11:00:35 EDT 2025
Wed Feb 19 02:29:06 EST 2025
Tue Nov 18 21:03:46 EST 2025
Sat Nov 29 03:49:17 EST 2025
Wed Aug 28 03:19:48 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c353t-3309be5f5337b5e34416d49a6b7fed9cf0901b0bf8fffd4e390c2257f42e5b813
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-5580-0484
PMID 32315391
PQID 2393657968
PQPubID 23479
PageCount 2
ParticipantIDs proquest_miscellaneous_2393657968
pubmed_primary_32315391
crossref_citationtrail_10_1093_bioinformatics_btaa257
crossref_primary_10_1093_bioinformatics_btaa257
oup_primary_10_1093_bioinformatics_btaa257
PublicationCentury 2000
PublicationDate 2020-06-01
PublicationDateYYYYMMDD 2020-06-01
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-06-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics
PublicationTitleAlternate Bioinformatics
PublicationYear 2020
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Blum (2023063011300555600_btaa257-B3) 2003; 35
Wang (2023063011300555600_btaa257-B8) 1994; 1
Ortuño (2023063011300555600_btaa257-B7) 2013; 29
Zambrano-Vega (2023063011300555600_btaa257-B9) 2017; 33
Deb (2023063011300555600_btaa257-B4) 2002; 6
Handl (2023063011300555600_btaa257-B5) 2007; 4
Benítez-Hidalgo (2023063011300555600_btaa257-B2) 2019; 51
Nguyen (2023063011300555600_btaa257-B6) 2016
Bacon (2023063011300555600_btaa257-B1) 1986; 191
References_xml – volume: 4
  start-page: 279
  year: 2007
  ident: 2023063011300555600_btaa257-B5
  article-title: Multiobjective optimization in bioinformatics and computational biology
  publication-title: IEEE/ACM Trans. Comput. Biol. Bioinf
  doi: 10.1109/TCBB.2007.070203
– volume-title: Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Applications
  year: 2016
  ident: 2023063011300555600_btaa257-B6
  doi: 10.1002/9781119273769
– volume: 1
  start-page: 337
  year: 1994
  ident: 2023063011300555600_btaa257-B8
  article-title: On the complexity of multiple sequence alignment
  publication-title: J. Comput. Biol
  doi: 10.1089/cmb.1994.1.337
– volume: 6
  start-page: 182
  year: 2002
  ident: 2023063011300555600_btaa257-B4
  article-title: A fast and elitist multiobjective genetic algorithm: NSGA-II
  publication-title: IEEE Trans. Evol. Comput
  doi: 10.1109/4235.996017
– volume: 29
  start-page: 2112
  year: 2013
  ident: 2023063011300555600_btaa257-B7
  article-title: Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt360
– volume: 51
  start-page: 100598
  year: 2019
  ident: 2023063011300555600_btaa257-B2
  article-title: jMetalPy: a Python framework for multi-objective optimization with metaheuristics
  publication-title: Swarm Evol. Comput
  doi: 10.1016/j.swevo.2019.100598
– volume: 35
  start-page: 268
  year: 2003
  ident: 2023063011300555600_btaa257-B3
  article-title: Metaheuristics in combinatorial optimization: overview and conceptual comparison
  publication-title: ACM Comput. Surv
  doi: 10.1145/937503.937505
– volume: 33
  start-page: 3011
  year: 2017
  ident: 2023063011300555600_btaa257-B9
  article-title: M2align: parallel multiple sequence alignment with a multi-objective metaheuristic
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx338
– volume: 191
  start-page: 153
  year: 1986
  ident: 2023063011300555600_btaa257-B1
  article-title: Multiple sequence alignment
  publication-title: J. Mol. Biol
  doi: 10.1016/0022-2836(86)90252-4
SSID ssj0051444
ssj0005056
Score 2.3531528
Snippet Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly...
Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may...
SourceID proquest
pubmed
crossref
oup
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 3892
SubjectTerms Algorithms
Biological Evolution
Programming Languages
Sequence Alignment
Software
Title Sequoya: multiobjective multiple sequence alignment in Python
URI https://www.ncbi.nlm.nih.gov/pubmed/32315391
https://www.proquest.com/docview/2393657968
Volume 36
WOSCitedRecordID wos000550127500041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1ZS8NAEB5qUfDF-6hHieCTEJpkc-wKPogoPtWCFfIWdje7UpFE21Tov3c2R6WKeLwEAtlNmNnd-SYz8w3AqdShn6Jhsw2xCV6kYwueMtujXCDA4KhoXjabiPp9Gsds0AK3qYX5HMJnpCdGeU0iaoiLe6LgHNcZnrpuQE3PguFd_JHU4RhqmOoGoYBf9bQ11N7UIU2B8LdzLtimhXq3L7CzND836__48A1Yq7GmdVktjk1oqWwLVqruk7NtuLhXr9N8xs-tMqswF0_V4Wc1SYZWk2dtIVh_LNMGrFFmDWaGb2AHHm6uh1e3dt1NwZYkIIVNiMOECjTiu0gEiiAOClOf8VBEWqVMagehgXCEplrr1FeEORI3e6R9TwWCumQX2lmeqX2wQjwVhGeo8JlG91pxnaboZauAOlx6Lu9A0MgxkTXVuOl48ZxUIW-SLIomqUXTgd583EtFtvHjiDNU068fPmm0meAmMpERnql8OkkMD1xoqnJpB_YqNc_nJIiAA8Lcg7-86hBWPeOYl79rjqBdjKfqGJblWzGajLuwFMW0W67cd-VT8vE
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sequoya%3A+multiobjective+multiple+sequence+alignment+in+Python&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Ben%C3%ADtez-Hidalgo%2C+Antonio&rft.au=Nebro%2C+Antonio+J&rft.au=Aldana-Montes%2C+Jos%C3%A9+F&rft.date=2020-06-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=36&rft.issue=12&rft.spage=3892&rft.epage=3893&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaa257&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btaa257
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon