Sequoya: multiobjective multiple sequence alignment in Python

Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard compl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics Jg. 36; H. 12; S. 3892 - 3893
Hauptverfasser:	Benítez-Hidalgo, Antonio, Nebro, Antonio J, Aldana-Montes, José F
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	England Oxford University Press 01.06.2020
Schlagworte:	Algorithms Biological Evolution Programming Languages Sequence Alignment Software
ISSN:	1367-4803, 1367-4811, 1460-2059, 1367-4811
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. Results The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Availability and implementation Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList	Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. Results The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Availability and implementation Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary information Supplementary data are available at Bioinformatics online. Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary data are available at Bioinformatics online. Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.MOTIVATIONMultiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.RESULTSThe software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.AVAILABILITY AND IMPLEMENTATIONSequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Author	Benítez-Hidalgo, Antonio Nebro, Antonio J Aldana-Montes, José F
Author_xml	– sequence: 1 givenname: Antonio surname: Benítez-Hidalgo fullname: Benítez-Hidalgo, Antonio email: antonio.benitez@lcc.uma.es organization: Departamento de Lenguajes y Ciencias de la Computación – sequence: 2 givenname: Antonio J orcidid: 0000-0001-5580-0484 surname: Nebro fullname: Nebro, Antonio J organization: Departamento de Lenguajes y Ciencias de la Computación – sequence: 3 givenname: José F surname: Aldana-Montes fullname: Aldana-Montes, José F organization: Departamento de Lenguajes y Ciencias de la Computación
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/32315391$$D View this record in MEDLINE/PubMed
BookMark	eNqNkE1LxDAQhoMofv-FpUcv1aSTtBvRg4hfICio55BkJ5qlTdYmFfbfW9n1oBc9TQaeZybz7pHNEAMSMmH0mFEJJ8ZHH1zsO529TScma12JZoPsMl7TsqJCbo5vqJuSTynskL2U5pQKxjnfJjtQARMg2S45f8L3IS71adENbfbRzNFm_4GrdtFikUYAg8VCt_41dBhy4UPxuMxvMRyQLafbhIfruk9erq-eL2_L-4ebu8uL-9KCgFwCUGlQOAHQGIHAOatnXOraNA5n0joqKTPUuKlzbsYRJLXVeI7jFQozZbBPjlZzF30cf5Oy6nyy2LY6YBySqkBCLRpZT0d0skYH0-FMLXrf6X6pvk8egbMVYPuYUo9OWZ_HFGPIvfatYlR9Jax-JqzWCY96_Uv_3vCnyFZiHBb_dT4BqeaaNw
CitedBy_id	crossref_primary_10_1007_s11227_022_04697_9 crossref_primary_10_1145_3763229 crossref_primary_10_1007_s00425_024_04420_3 crossref_primary_10_1016_j_compbiolchem_2022_107661 crossref_primary_10_3390_mi14081577
Cites_doi	10.1109/TCBB.2007.070203 10.1002/9781119273769 10.1089/cmb.1994.1.337 10.1109/4235.996017 10.1093/bioinformatics/btt360 10.1016/j.swevo.2019.100598 10.1145/937503.937505 10.1093/bioinformatics/btx338 10.1016/0022-2836(86)90252-4
ContentType	Journal Article
Copyright	The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Copyright_xml	– notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020 – notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8
DOI	10.1093/bioinformatics/btaa257
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	MEDLINE MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1460-2059 1367-4811
EndPage	3893
ExternalDocumentID	32315391 10_1093_bioinformatics_btaa257 10.1093/bioinformatics/btaa257
Genre	Research Support, Non-U.S. Gov't Journal Article
GroupedDBID	-~X .2P 5GY AAMVS ABPTD ACGFS ADZXQ ALMA_UNASSIGNED_HOLDINGS BCRHZ F5P HW0 KOP Q5Y RD5 ROX TLC TN5 TOX WH7 --- -E4 .DC .I3 0R~ 23N 2WC 4.4 48X 53G 5WA 70D AAIJN AAIMJ AAJKP AAKPC AAMDB AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN AAYXX ABEJV ABEUO ABGNP ABIXL ABNKS ABPQP ABQLI ABWST ABXVV ABZBJ ACIWK ACPRK ACUFI ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AIJHB AJEEA AJEUX AKHUL AKWXX ALTZX ALUQC AMNDL APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C45 CDBKE CITATION CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EMOBN F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HZ~ IOX J21 JXSIZ KAQDR KQ8 KSI KSN M-Z MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NU- O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. R44 RNS ROL RPM RUSNO RW1 RXO SV3 TEORI TJP TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~91 ~KM ADRIX AFXEN CGR CUY CVF ECM EIF M49 NPM 7X8
ID	FETCH-LOGICAL-c353t-3309be5f5337b5e34416d49a6b7fed9cf0901b0bf8fffd4e390c2257f42e5b813
IEDL.DBID	TOX
ISICitedReferencesCount	4
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550127500041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1367-4803 1367-4811
IngestDate	Fri Jul 11 11:00:35 EDT 2025 Wed Feb 19 02:29:06 EST 2025 Tue Nov 18 21:03:46 EST 2025 Sat Nov 29 03:49:17 EST 2025 Wed Aug 28 03:19:48 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	12
Language	English
License	This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c353t-3309be5f5337b5e34416d49a6b7fed9cf0901b0bf8fffd4e390c2257f42e5b813
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ORCID	0000-0001-5580-0484
PMID	32315391
PQID	2393657968
PQPubID	23479
PageCount	2
ParticipantIDs	proquest_miscellaneous_2393657968 pubmed_primary_32315391 crossref_citationtrail_10_1093_bioinformatics_btaa257 crossref_primary_10_1093_bioinformatics_btaa257 oup_primary_10_1093_bioinformatics_btaa257
PublicationCentury	2000
PublicationDate	2020-06-01
PublicationDateYYYYMMDD	2020-06-01
PublicationDate_xml	– month: 06 year: 2020 text: 2020-06-01 day: 01
PublicationDecade	2020
PublicationPlace	England
PublicationPlace_xml	– name: England
PublicationTitle	Bioinformatics
PublicationTitleAlternate	Bioinformatics
PublicationYear	2020
Publisher	Oxford University Press
Publisher_xml	– name: Oxford University Press
References	Blum (2023063011300555600_btaa257-B3) 2003; 35 Wang (2023063011300555600_btaa257-B8) 1994; 1 Ortuño (2023063011300555600_btaa257-B7) 2013; 29 Zambrano-Vega (2023063011300555600_btaa257-B9) 2017; 33 Deb (2023063011300555600_btaa257-B4) 2002; 6 Handl (2023063011300555600_btaa257-B5) 2007; 4 Benítez-Hidalgo (2023063011300555600_btaa257-B2) 2019; 51 Nguyen (2023063011300555600_btaa257-B6) 2016 Bacon (2023063011300555600_btaa257-B1) 1986; 191
References_xml	– volume: 4 start-page: 279 year: 2007 ident: 2023063011300555600_btaa257-B5 article-title: Multiobjective optimization in bioinformatics and computational biology publication-title: IEEE/ACM Trans. Comput. Biol. Bioinf doi: 10.1109/TCBB.2007.070203 – volume-title: Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Applications year: 2016 ident: 2023063011300555600_btaa257-B6 doi: 10.1002/9781119273769 – volume: 1 start-page: 337 year: 1994 ident: 2023063011300555600_btaa257-B8 article-title: On the complexity of multiple sequence alignment publication-title: J. Comput. Biol doi: 10.1089/cmb.1994.1.337 – volume: 6 start-page: 182 year: 2002 ident: 2023063011300555600_btaa257-B4 article-title: A fast and elitist multiobjective genetic algorithm: NSGA-II publication-title: IEEE Trans. Evol. Comput doi: 10.1109/4235.996017 – volume: 29 start-page: 2112 year: 2013 ident: 2023063011300555600_btaa257-B7 article-title: Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt360 – volume: 51 start-page: 100598 year: 2019 ident: 2023063011300555600_btaa257-B2 article-title: jMetalPy: a Python framework for multi-objective optimization with metaheuristics publication-title: Swarm Evol. Comput doi: 10.1016/j.swevo.2019.100598 – volume: 35 start-page: 268 year: 2003 ident: 2023063011300555600_btaa257-B3 article-title: Metaheuristics in combinatorial optimization: overview and conceptual comparison publication-title: ACM Comput. Surv doi: 10.1145/937503.937505 – volume: 33 start-page: 3011 year: 2017 ident: 2023063011300555600_btaa257-B9 article-title: M2align: parallel multiple sequence alignment with a multi-objective metaheuristic publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx338 – volume: 191 start-page: 153 year: 1986 ident: 2023063011300555600_btaa257-B1 article-title: Multiple sequence alignment publication-title: J. Mol. Biol doi: 10.1016/0022-2836(86)90252-4
SSID	ssj0051444 ssj0005056
Score	2.3531528
Snippet	Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly... Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may...
SourceID	proquest pubmed crossref oup
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	3892
SubjectTerms	Algorithms Biological Evolution Programming Languages Sequence Alignment Software
Title	Sequoya: multiobjective multiple sequence alignment in Python
URI	https://www.ncbi.nlm.nih.gov/pubmed/32315391 https://www.proquest.com/docview/2393657968
Volume	36
WOSCitedRecordID	wos000550127500041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 20220930 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1ZS8NAEB5qUfDF-6hHieCTEJpkc-wKPogoPtWCFfIWdje7UpFE21Tov3c2R6WKeLwEAtlNmNnd-SYz8w3AqdShn6Jhsw2xCV6kYwueMtujXCDA4KhoXjabiPp9Gsds0AK3qYX5HMJnpCdGeU0iaoiLe6LgHNcZnrpuQE3PguFd_JHU4RhqmOoGoYBf9bQ11N7UIU2B8LdzLtimhXq3L7CzND836__48A1Yq7GmdVktjk1oqWwLVqruk7NtuLhXr9N8xs-tMqswF0_V4Wc1SYZWk2dtIVh_LNMGrFFmDWaGb2AHHm6uh1e3dt1NwZYkIIVNiMOECjTiu0gEiiAOClOf8VBEWqVMagehgXCEplrr1FeEORI3e6R9TwWCumQX2lmeqX2wQjwVhGeo8JlG91pxnaboZauAOlx6Lu9A0MgxkTXVuOl48ZxUIW-SLIomqUXTgd583EtFtvHjiDNU068fPmm0meAmMpERnql8OkkMD1xoqnJpB_YqNc_nJIiAA8Lcg7-86hBWPeOYl79rjqBdjKfqGJblWzGajLuwFMW0W67cd-VT8vE
linkProvider	Oxford University Press
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sequoya%3A+multiobjective+multiple+sequence+alignment+in+Python&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Ben%C3%ADtez-Hidalgo%2C+Antonio&rft.au=Nebro%2C+Antonio+J&rft.au=Aldana-Montes%2C+Jos%C3%A9+F&rft.date=2020-06-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=36&rft.issue=12&rft.spage=3892&rft.epage=3893&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaa257&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btaa257
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon