Sequoya: multiobjective multiple sequence alignment in Python
Abstract Motivation Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard compl...
Gespeichert in:
| Veröffentlicht in: | Bioinformatics Jg. 36; H. 12; S. 3892 - 3893 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
Oxford University Press
01.06.2020
|
| Schlagworte: | |
| ISSN: | 1367-4803, 1367-4811, 1460-2059, 1367-4811 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Abstract
Motivation
Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.
Results
The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.
Availability and implementation
Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.
Supplementary information
Supplementary data are available at Bioinformatics online. |
|---|---|
| AbstractList | Abstract
Motivation
Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.
Results
The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.
Availability and implementation
Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.
Supplementary information
Supplementary data are available at Bioinformatics online. Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems. The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization. Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya. Supplementary data are available at Bioinformatics online. Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.MOTIVATIONMultiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may be the result of similarities and relationships between the sequences. MSA is an optimization problem with NP-hard complexity (non-deterministic polynomial-time hardness), because the time needed to find optimal alignments raises exponentially along with the number of sequences and their length. Furthermore, the problem becomes multiobjective when more than one score is considered to assess the quality of an alignment, such as maximizing the percentage of totally conserved columns and minimizing the number of gaps. Our motivation is to provide a Python tool for solving MSA problems using evolutionary algorithms, a nonexact stochastic optimization approach that has proven to be effective to solve multiobjective problems.The software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.RESULTSThe software tool we have developed, called Sequoya, is written in the Python programming language, which offers a broad set of libraries for data analysis, visualization and parallelism. Thus, Sequoya offers a graphical tool to visualize the progress of the optimization in real time, the ability to guide the search toward a preferred region in run-time, parallel support to distribute the computation among nodes in a distributed computing system, and a graphical component to assist in the analysis of the solutions found at the end of the optimization.Sequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.AVAILABILITY AND IMPLEMENTATIONSequoya can be freely obtained from the Python Package Index (pip) or, alternatively, it can be downloaded from Github at https://github.com/benhid/Sequoya.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. |
| Author | Benítez-Hidalgo, Antonio Nebro, Antonio J Aldana-Montes, José F |
| Author_xml | – sequence: 1 givenname: Antonio surname: Benítez-Hidalgo fullname: Benítez-Hidalgo, Antonio email: antonio.benitez@lcc.uma.es organization: Departamento de Lenguajes y Ciencias de la Computación – sequence: 2 givenname: Antonio J orcidid: 0000-0001-5580-0484 surname: Nebro fullname: Nebro, Antonio J organization: Departamento de Lenguajes y Ciencias de la Computación – sequence: 3 givenname: José F surname: Aldana-Montes fullname: Aldana-Montes, José F organization: Departamento de Lenguajes y Ciencias de la Computación |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/32315391$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkE1LxDAQhoMofv-FpUcv1aSTtBvRg4hfICio55BkJ5qlTdYmFfbfW9n1oBc9TQaeZybz7pHNEAMSMmH0mFEJJ8ZHH1zsO529TScma12JZoPsMl7TsqJCbo5vqJuSTynskL2U5pQKxjnfJjtQARMg2S45f8L3IS71adENbfbRzNFm_4GrdtFikUYAg8VCt_41dBhy4UPxuMxvMRyQLafbhIfruk9erq-eL2_L-4ebu8uL-9KCgFwCUGlQOAHQGIHAOatnXOraNA5n0joqKTPUuKlzbsYRJLXVeI7jFQozZbBPjlZzF30cf5Oy6nyy2LY6YBySqkBCLRpZT0d0skYH0-FMLXrf6X6pvk8egbMVYPuYUo9OWZ_HFGPIvfatYlR9Jax-JqzWCY96_Uv_3vCnyFZiHBb_dT4BqeaaNw |
| CitedBy_id | crossref_primary_10_1007_s11227_022_04697_9 crossref_primary_10_1145_3763229 crossref_primary_10_1007_s00425_024_04420_3 crossref_primary_10_1016_j_compbiolchem_2022_107661 crossref_primary_10_3390_mi14081577 |
| Cites_doi | 10.1109/TCBB.2007.070203 10.1002/9781119273769 10.1089/cmb.1994.1.337 10.1109/4235.996017 10.1093/bioinformatics/btt360 10.1016/j.swevo.2019.100598 10.1145/937503.937505 10.1093/bioinformatics/btx338 10.1016/0022-2836(86)90252-4 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. |
| Copyright_xml | – notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2020 – notice: The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1093/bioinformatics/btaa257 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1460-2059 1367-4811 |
| EndPage | 3893 |
| ExternalDocumentID | 32315391 10_1093_bioinformatics_btaa257 10.1093/bioinformatics/btaa257 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GroupedDBID | -~X .2P 5GY AAMVS ABPTD ACGFS ADZXQ ALMA_UNASSIGNED_HOLDINGS BCRHZ F5P HW0 KOP Q5Y RD5 ROX TLC TN5 TOX WH7 --- -E4 .DC .I3 0R~ 23N 2WC 4.4 48X 53G 5WA 70D AAIJN AAIMJ AAJKP AAKPC AAMDB AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN AAYXX ABEJV ABEUO ABGNP ABIXL ABNKS ABPQP ABQLI ABWST ABXVV ABZBJ ACIWK ACPRK ACUFI ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AIJHB AJEEA AJEUX AKHUL AKWXX ALTZX ALUQC AMNDL APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C45 CDBKE CITATION CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EMOBN F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HZ~ IOX J21 JXSIZ KAQDR KQ8 KSI KSN M-Z MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NU- O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. R44 RNS ROL RPM RUSNO RW1 RXO SV3 TEORI TJP TR2 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~91 ~KM ADRIX AFXEN CGR CUY CVF ECM EIF M49 NPM 7X8 |
| ID | FETCH-LOGICAL-c353t-3309be5f5337b5e34416d49a6b7fed9cf0901b0bf8fffd4e390c2257f42e5b813 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550127500041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4803 1367-4811 |
| IngestDate | Fri Jul 11 11:00:35 EDT 2025 Wed Feb 19 02:29:06 EST 2025 Tue Nov 18 21:03:46 EST 2025 Sat Nov 29 03:49:17 EST 2025 Wed Aug 28 03:19:48 EDT 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Language | English |
| License | This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c353t-3309be5f5337b5e34416d49a6b7fed9cf0901b0bf8fffd4e390c2257f42e5b813 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0001-5580-0484 |
| PMID | 32315391 |
| PQID | 2393657968 |
| PQPubID | 23479 |
| PageCount | 2 |
| ParticipantIDs | proquest_miscellaneous_2393657968 pubmed_primary_32315391 crossref_citationtrail_10_1093_bioinformatics_btaa257 crossref_primary_10_1093_bioinformatics_btaa257 oup_primary_10_1093_bioinformatics_btaa257 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-06-01 |
| PublicationDateYYYYMMDD | 2020-06-01 |
| PublicationDate_xml | – month: 06 year: 2020 text: 2020-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Bioinformatics |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2020 |
| Publisher | Oxford University Press |
| Publisher_xml | – name: Oxford University Press |
| References | Blum (2023063011300555600_btaa257-B3) 2003; 35 Wang (2023063011300555600_btaa257-B8) 1994; 1 Ortuño (2023063011300555600_btaa257-B7) 2013; 29 Zambrano-Vega (2023063011300555600_btaa257-B9) 2017; 33 Deb (2023063011300555600_btaa257-B4) 2002; 6 Handl (2023063011300555600_btaa257-B5) 2007; 4 Benítez-Hidalgo (2023063011300555600_btaa257-B2) 2019; 51 Nguyen (2023063011300555600_btaa257-B6) 2016 Bacon (2023063011300555600_btaa257-B1) 1986; 191 |
| References_xml | – volume: 4 start-page: 279 year: 2007 ident: 2023063011300555600_btaa257-B5 article-title: Multiobjective optimization in bioinformatics and computational biology publication-title: IEEE/ACM Trans. Comput. Biol. Bioinf doi: 10.1109/TCBB.2007.070203 – volume-title: Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Applications year: 2016 ident: 2023063011300555600_btaa257-B6 doi: 10.1002/9781119273769 – volume: 1 start-page: 337 year: 1994 ident: 2023063011300555600_btaa257-B8 article-title: On the complexity of multiple sequence alignment publication-title: J. Comput. Biol doi: 10.1089/cmb.1994.1.337 – volume: 6 start-page: 182 year: 2002 ident: 2023063011300555600_btaa257-B4 article-title: A fast and elitist multiobjective genetic algorithm: NSGA-II publication-title: IEEE Trans. Evol. Comput doi: 10.1109/4235.996017 – volume: 29 start-page: 2112 year: 2013 ident: 2023063011300555600_btaa257-B7 article-title: Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt360 – volume: 51 start-page: 100598 year: 2019 ident: 2023063011300555600_btaa257-B2 article-title: jMetalPy: a Python framework for multi-objective optimization with metaheuristics publication-title: Swarm Evol. Comput doi: 10.1016/j.swevo.2019.100598 – volume: 35 start-page: 268 year: 2003 ident: 2023063011300555600_btaa257-B3 article-title: Metaheuristics in combinatorial optimization: overview and conceptual comparison publication-title: ACM Comput. Surv doi: 10.1145/937503.937505 – volume: 33 start-page: 3011 year: 2017 ident: 2023063011300555600_btaa257-B9 article-title: M2align: parallel multiple sequence alignment with a multi-objective metaheuristic publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx338 – volume: 191 start-page: 153 year: 1986 ident: 2023063011300555600_btaa257-B1 article-title: Multiple sequence alignment publication-title: J. Mol. Biol doi: 10.1016/0022-2836(86)90252-4 |
| SSID | ssj0051444 ssj0005056 |
| Score | 2.3531528 |
| Snippet | Abstract
Motivation
Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly... Multiple sequence alignment (MSA) consists of finding the optimal alignment of three or more biological sequences to identify highly conserved regions that may... |
| SourceID | proquest pubmed crossref oup |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 3892 |
| SubjectTerms | Algorithms Biological Evolution Programming Languages Sequence Alignment Software |
| Title | Sequoya: multiobjective multiple sequence alignment in Python |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/32315391 https://www.proquest.com/docview/2393657968 |
| Volume | 36 |
| WOSCitedRecordID | wos000550127500041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 20220930 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1ZS8NAEB5qUfDF-6hHieCTEJpkc-wKPogoPtWCFfIWdje7UpFE21Tov3c2R6WKeLwEAtlNmNnd-SYz8w3AqdShn6Jhsw2xCV6kYwueMtujXCDA4KhoXjabiPp9Gsds0AK3qYX5HMJnpCdGeU0iaoiLe6LgHNcZnrpuQE3PguFd_JHU4RhqmOoGoYBf9bQ11N7UIU2B8LdzLtimhXq3L7CzND836__48A1Yq7GmdVktjk1oqWwLVqruk7NtuLhXr9N8xs-tMqswF0_V4Wc1SYZWk2dtIVh_LNMGrFFmDWaGb2AHHm6uh1e3dt1NwZYkIIVNiMOECjTiu0gEiiAOClOf8VBEWqVMagehgXCEplrr1FeEORI3e6R9TwWCumQX2lmeqX2wQjwVhGeo8JlG91pxnaboZauAOlx6Lu9A0MgxkTXVuOl48ZxUIW-SLIomqUXTgd583EtFtvHjiDNU068fPmm0meAmMpERnql8OkkMD1xoqnJpB_YqNc_nJIiAA8Lcg7-86hBWPeOYl79rjqBdjKfqGJblWzGajLuwFMW0W67cd-VT8vE |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sequoya%3A+multiobjective+multiple+sequence+alignment+in+Python&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Ben%C3%ADtez-Hidalgo%2C+Antonio&rft.au=Nebro%2C+Antonio+J&rft.au=Aldana-Montes%2C+Jos%C3%A9+F&rft.date=2020-06-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=36&rft.issue=12&rft.spage=3892&rft.epage=3893&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaa257&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btaa257 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |