Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm
Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generate...
Gespeichert in:
| Veröffentlicht in: | BMC bioinformatics Jg. 16; H. 1; S. 218 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
London
BioMed Central
10.07.2015
BioMed Central Ltd |
| Schlagworte: | |
| ISSN: | 1471-2105, 1471-2105 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Background
Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here.
Results
All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios.
Conclusions
The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. |
|---|---|
| AbstractList | Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. Keywords: MCL, Protein clustering, Sequence clustering, Homology prediction, Graph, Genomics, Bioinformatics, Transcriptomics, Short-read sequencing, High-throughput sequencing Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here.BACKGROUNDClustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here.All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios.RESULTSAll metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios.The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.CONCLUSIONSThe results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. |
| ArticleNumber | 218 |
| Audience | Academic |
| Author | Mount, Stephen M. Gibbons, Theodore R. Cooper, Endymion D. Delwiche, Charles F. |
| Author_xml | – sequence: 1 givenname: Theodore R. surname: Gibbons fullname: Gibbons, Theodore R. email: trgibbons@gmail.com organization: Department of Cell Biology and Molecular Genetics, University of Maryland – sequence: 2 givenname: Stephen M. surname: Mount fullname: Mount, Stephen M. organization: Department of Cell Biology and Molecular Genetics, University of Maryland, Center for Bioinformatics and Computational Biology, University of Maryland – sequence: 3 givenname: Endymion D. surname: Cooper fullname: Cooper, Endymion D. organization: Department of Cell Biology and Molecular Genetics, University of Maryland – sequence: 4 givenname: Charles F. surname: Delwiche fullname: Delwiche, Charles F. organization: Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland Agricultural Experiment Station, University of Maryland |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/26160651$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kktv1DAUhSNURB_wA9ggS2xgkWIn8SMbpGHUlkqDkGhZW05ynXFJ4qntTKf_HoeUqoNQ5YUt3-8c-fqe4-RgsAMkyVuCTwkR7JMnmaBliglNMctounuRHJGCkzQjmB48OR8mx97fYEy4wPRVcpgxwjCj5Ci5PduqblTB2AFZjb6sFlfXaaU8NAiaFtI7MO06mKFFPQRnao_GqaatQ2vb286298gMGhwMNaA7E9YorAF9U-6X3aJlN_oAbpKrrrUulvvXyUutOg9vHvaT5Of52fXya7r6fnG5XKzSmpE8pKIqeV7qkkJVMFHoSpS1hkxzyBSjLOdY6EppESmuADCneQ5E04I3hRANz0-Sz7PvZqx6aGoYglOd3DjTK3cvrTJyvzKYtWztVhZFyQQl0eDDg4GztyP4IHvja-g6NYAdvSSspJxhntGIvp_RVnUg43_Y6FhPuFzQguRUZFkZqdP_UHE10Js6jlabeL8n-LgniEyAXWjV6L28vPqxz7572u5jn39HHQEyA7Wz3jvQjwjBcoqTnOMkY5zkFCe5ixr-j6Y24U9Y4stN96wym5V-M40fnLyxoxviwJ8R_QbfXN8V |
| CitedBy_id | crossref_primary_10_1002_prot_26668 crossref_primary_10_1007_s10482_022_01750_8 crossref_primary_10_1093_gigascience_giz118 crossref_primary_10_1016_j_syapm_2018_07_005 crossref_primary_10_1093_nar_gkx977 crossref_primary_10_1128_msystems_00705_19 crossref_primary_10_1099_mgen_0_000174 crossref_primary_10_1186_s12859_019_2670_3 crossref_primary_10_1007_s11390_023_2835_4 crossref_primary_10_1371_journal_pcbi_1009341 crossref_primary_10_3389_fmicb_2023_1250602 crossref_primary_10_1186_s12859_015_0690_1 crossref_primary_10_1186_s12862_018_1142_0 |
| Cites_doi | 10.1101/gr.1239303 10.1371/journal.pone.0000383 10.1093/nar/30.7.1575 10.7717/peerj.332 10.1006/jmbi.2000.5197 10.1093/bioinformatics/btq655 10.1038/35057062 10.1101/gr.1224503 10.1038/35048692 10.1093/bioinformatics/btt582 10.1186/1471-2105-10-421 10.1016/j.compbiomed.2014.02.016 10.1186/1471-2105-6-2 10.1093/nar/gkg072 10.1073/pnas.95.11.6239 10.1126/science.1080049 10.1093/bioinformatics/btq675 10.1126/science.278.5338.631 10.1093/bioinformatics/btm071 10.1016/S0022-2836(05)80360-2 10.1093/nar/gkj515 10.1128/EC.2.6.1137-1150.2003 10.1126/science.282.5396.2012 10.1186/1471-2105-4-41 10.1038/nature724 10.1146/annurev.genet.39.073003.114725 10.1016/S0168-9525(02)02793-2 10.1126/science.287.5461.2185 10.1126/science.1076181 10.1038/35106579 |
| ContentType | Journal Article |
| Copyright | Gibbons et al. 2015 COPYRIGHT 2015 BioMed Central Ltd. |
| Copyright_xml | – notice: Gibbons et al. 2015 – notice: COPYRIGHT 2015 BioMed Central Ltd. |
| DBID | C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM ISR 7X8 5PM |
| DOI | 10.1186/s12859-015-0625-x |
| DatabaseName | Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Science MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1471-2105 |
| ExternalDocumentID | PMC4496851 A541358229 26160651 10_1186_s12859_015_0625_x |
| Genre | Research Support, U.S. Gov't, Non-P.H.S Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: T32 GM080201 |
| GroupedDBID | --- 0R~ 23N 2WC 4.4 53G 5VS 6J9 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACUHS ADBBV ADMLS ADRAZ ADUKV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHBYD AHMBA AHSBF AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EJD EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 H13 HCIFZ HMCUK HYE IAO ICD IHR INH INR ISR ITC K6V K7- KQ8 LK8 M1P M48 M7P MK~ ML0 M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PUEGO RBZ RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XH6 XSB AAYXX AFFHD CITATION ALIPV CGR CUY CVF ECM EIF NPM 7X8 5PM |
| ID | FETCH-LOGICAL-c613t-8b9739f95eb4684fb89cfe2f7e2a6563708fbaf89737aee07533e1f547d488d73 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 15 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000357631800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1471-2105 |
| IngestDate | Tue Nov 04 01:59:33 EST 2025 Thu Sep 04 20:03:28 EDT 2025 Tue Nov 11 10:16:53 EST 2025 Tue Nov 04 17:56:25 EST 2025 Thu Nov 13 14:17:47 EST 2025 Mon Jul 21 06:02:19 EDT 2025 Sat Nov 29 05:39:58 EST 2025 Tue Nov 18 22:35:12 EST 2025 Sat Sep 06 07:27:17 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | Graph High-throughput sequencing Transcriptomics Short-read sequencing Protein clustering Genomics MCL Homology prediction Bioinformatics Sequence clustering |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c613t-8b9739f95eb4684fb89cfe2f7e2a6563708fbaf89737aee07533e1f547d488d73 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://link.springer.com/10.1186/s12859-015-0625-x |
| PMID | 26160651 |
| PQID | 1695760725 |
| PQPubID | 23479 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_4496851 proquest_miscellaneous_1695760725 gale_infotracmisc_A541358229 gale_infotracacademiconefile_A541358229 gale_incontextgauss_ISR_A541358229 pubmed_primary_26160651 crossref_primary_10_1186_s12859_015_0625_x crossref_citationtrail_10_1186_s12859_015_0625_x springer_journals_10_1186_s12859_015_0625_x |
| PublicationCentury | 2000 |
| PublicationDate | 20150710 2015-7-10 2015-Jul-10 |
| PublicationDateYYYYMMDD | 2015-07-10 |
| PublicationDate_xml | – month: 7 year: 2015 text: 20150710 day: 10 |
| PublicationDecade | 2010 |
| PublicationPlace | London |
| PublicationPlace_xml | – name: London – name: England |
| PublicationTitle | BMC bioinformatics |
| PublicationTitleAbbrev | BMC Bioinformatics |
| PublicationTitleAlternate | BMC Bioinformatics |
| PublicationYear | 2015 |
| Publisher | BioMed Central BioMed Central Ltd |
| Publisher_xml | – name: BioMed Central – name: BioMed Central Ltd |
| References | A Goffeau (625_CR16) 1996; 274 SF Altschul (625_CR1) 1990; 215 MC Rivera (625_CR3) 1998; 95 EV Koonin (625_CR7) 2005; 39 OK Ekseth (625_CR11) 2014; 30 The C. elegans Sequencing Consortium (625_CR17) 1998; 282 AR Grossman (625_CR25) 2003; 2 JW Sahl (625_CR13) 2014; 2 SM Szilágyi (625_CR27) 2014; 48 DA Rasko (625_CR12) 2005; 6 P Shannon (625_CR30) 2003; 13 ME Smoot (625_CR31) 2011; 27 625_CR6 625_CR5 P Dehal (625_CR24) 2002; 298 L Apeltsin (625_CR29) 2011; 27 625_CR18 MD Adams (625_CR19) 2000; 287 AJ Enright (625_CR9) 2002; 30 F Chen (625_CR28) 2007; 2 C Camacho (625_CR2) 2009; 10 G Parra (625_CR15) 2007; 23 A Paccanaro (625_CR4) 2006; 34 RL Tatusov (625_CR8) 1997; 278 RL Tatusov (625_CR14) 2003; 4 RA Holt (625_CR23) 2002; 298 V Wood (625_CR22) 2002; 415 L Li (625_CR10) 2003; 13 ES Lander (625_CR21) 2001; 409 MD Katinka (625_CR20) 2001; 414 625_CR26 11859360 - Nature. 2002 Feb 21;415(6874):871-80 2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10 21149340 - Bioinformatics. 2011 Feb 1;27(3):431-2 24657908 - Comput Biol Med. 2014 May;48:94-101 16547200 - Nucleic Acids Res. 2006;34(5):1571-80 12364791 - Science. 2002 Oct 4;298(5591):129-49 11917018 - Nucleic Acids Res. 2002 Apr 1;30(7):1575-84 11130711 - Nature. 2000 Dec 14;408(6814):796-815 16285863 - Annu Rev Genet. 2005;39:309-38 12481130 - Science. 2002 Dec 13;298(5601):2157-67 14665449 - Eukaryot Cell. 2003 Dec;2(6):1137-50 12969510 - BMC Bioinformatics. 2003 Sep 11;4:41 9381173 - Science. 1997 Oct 24;278(5338):631-7 12519989 - Nucleic Acids Res. 2003 Jan 1;31(1):234-6 15634352 - BMC Bioinformatics. 2005;6:2 24115168 - Bioinformatics. 2014 Mar 1;30(5):734-6 9851916 - Science. 1998 Dec 11;282(5396):2012-8 24749011 - PeerJ. 2014 Apr 01;2:e332 9600949 - Proc Natl Acad Sci U S A. 1998 May 26;95(11):6239-44 17332020 - Bioinformatics. 2007 May 1;23(9):1061-7 10731132 - Science. 2000 Mar 24;287(5461):2185-95 12446146 - Trends Genet. 2002 Dec;18(12):619-20 11237011 - Nature. 2001 Feb 15;409(6822):860-921 11719806 - Nature. 2001 Nov 22;414(6862):450-3 17440619 - PLoS One. 2007;2(4):e383 12952885 - Genome Res. 2003 Sep;13(9):2178-89 20003500 - BMC Bioinformatics. 2009;10:421 21118823 - Bioinformatics. 2011 Feb 1;27(3):326-33 8849441 - Science. 1996 Oct 25;274(5287):546, 563-7 11743721 - J Mol Biol. 2001 Dec 14;314(5):1041-52 26315999 - BMC Bioinformatics. 2015;16:274 14597658 - Genome Res. 2003 Nov;13(11):2498-504 |
| References_xml | – volume: 13 start-page: 2498 year: 2003 ident: 625_CR30 publication-title: Genome Res doi: 10.1101/gr.1239303 – volume: 2 year: 2007 ident: 625_CR28 publication-title: PLoS One doi: 10.1371/journal.pone.0000383 – volume: 30 start-page: 1575 year: 2002 ident: 625_CR9 publication-title: Nucleic Acids Res doi: 10.1093/nar/30.7.1575 – volume: 2 year: 2014 ident: 625_CR13 publication-title: PeerJ doi: 10.7717/peerj.332 – ident: 625_CR5 doi: 10.1006/jmbi.2000.5197 – volume: 27 start-page: 326 year: 2011 ident: 625_CR29 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq655 – volume: 409 start-page: 860 year: 2001 ident: 625_CR21 publication-title: Nature doi: 10.1038/35057062 – volume: 13 start-page: 2178 year: 2003 ident: 625_CR10 publication-title: Genome Res doi: 10.1101/gr.1224503 – ident: 625_CR18 doi: 10.1038/35048692 – volume: 30 start-page: 734 year: 2014 ident: 625_CR11 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt582 – volume: 10 start-page: 421 year: 2009 ident: 625_CR2 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-10-421 – volume: 48 start-page: 94 year: 2014 ident: 625_CR27 publication-title: Comput Biol Med doi: 10.1016/j.compbiomed.2014.02.016 – volume: 6 start-page: 2 year: 2005 ident: 625_CR12 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-6-2 – ident: 625_CR26 doi: 10.1093/nar/gkg072 – volume: 95 start-page: 6239 year: 1998 ident: 625_CR3 publication-title: Proc Natl Acad Sci U S A doi: 10.1073/pnas.95.11.6239 – volume: 298 start-page: 2157 year: 2002 ident: 625_CR24 publication-title: Science doi: 10.1126/science.1080049 – volume: 27 start-page: 431 year: 2011 ident: 625_CR31 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq675 – volume: 278 start-page: 631 year: 1997 ident: 625_CR8 publication-title: Science doi: 10.1126/science.278.5338.631 – volume: 274 start-page: 563 issue: 546 year: 1996 ident: 625_CR16 publication-title: Science – volume: 23 start-page: 1061 year: 2007 ident: 625_CR15 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm071 – volume: 215 start-page: 403 year: 1990 ident: 625_CR1 publication-title: J Mol Biol doi: 10.1016/S0022-2836(05)80360-2 – volume: 34 start-page: 1571 year: 2006 ident: 625_CR4 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkj515 – volume: 2 start-page: 1137 year: 2003 ident: 625_CR25 publication-title: Eukaryot Cell doi: 10.1128/EC.2.6.1137-1150.2003 – volume: 282 start-page: 2012 year: 1998 ident: 625_CR17 publication-title: Science doi: 10.1126/science.282.5396.2012 – volume: 4 start-page: 41 year: 2003 ident: 625_CR14 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-4-41 – volume: 415 start-page: 871 year: 2002 ident: 625_CR22 publication-title: Nature doi: 10.1038/nature724 – volume: 39 start-page: 309 year: 2005 ident: 625_CR7 publication-title: Annu Rev Genet doi: 10.1146/annurev.genet.39.073003.114725 – ident: 625_CR6 doi: 10.1016/S0168-9525(02)02793-2 – volume: 287 start-page: 2185 year: 2000 ident: 625_CR19 publication-title: Science doi: 10.1126/science.287.5461.2185 – volume: 298 start-page: 129 year: 2002 ident: 625_CR23 publication-title: Science doi: 10.1126/science.1076181 – volume: 414 start-page: 450 year: 2001 ident: 625_CR20 publication-title: Nature doi: 10.1038/35106579 – reference: 9600949 - Proc Natl Acad Sci U S A. 1998 May 26;95(11):6239-44 – reference: 21118823 - Bioinformatics. 2011 Feb 1;27(3):326-33 – reference: 8849441 - Science. 1996 Oct 25;274(5287):546, 563-7 – reference: 21149340 - Bioinformatics. 2011 Feb 1;27(3):431-2 – reference: 10731132 - Science. 2000 Mar 24;287(5461):2185-95 – reference: 11237011 - Nature. 2001 Feb 15;409(6822):860-921 – reference: 26315999 - BMC Bioinformatics. 2015;16:274 – reference: 12446146 - Trends Genet. 2002 Dec;18(12):619-20 – reference: 24749011 - PeerJ. 2014 Apr 01;2:e332 – reference: 20003500 - BMC Bioinformatics. 2009;10:421 – reference: 16547200 - Nucleic Acids Res. 2006;34(5):1571-80 – reference: 12969510 - BMC Bioinformatics. 2003 Sep 11;4:41 – reference: 11859360 - Nature. 2002 Feb 21;415(6874):871-80 – reference: 24115168 - Bioinformatics. 2014 Mar 1;30(5):734-6 – reference: 11743721 - J Mol Biol. 2001 Dec 14;314(5):1041-52 – reference: 9381173 - Science. 1997 Oct 24;278(5338):631-7 – reference: 11917018 - Nucleic Acids Res. 2002 Apr 1;30(7):1575-84 – reference: 12519989 - Nucleic Acids Res. 2003 Jan 1;31(1):234-6 – reference: 12481130 - Science. 2002 Dec 13;298(5601):2157-67 – reference: 16285863 - Annu Rev Genet. 2005;39:309-38 – reference: 15634352 - BMC Bioinformatics. 2005;6:2 – reference: 2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10 – reference: 11130711 - Nature. 2000 Dec 14;408(6814):796-815 – reference: 24657908 - Comput Biol Med. 2014 May;48:94-101 – reference: 9851916 - Science. 1998 Dec 11;282(5396):2012-8 – reference: 14665449 - Eukaryot Cell. 2003 Dec;2(6):1137-50 – reference: 12952885 - Genome Res. 2003 Sep;13(9):2178-89 – reference: 17332020 - Bioinformatics. 2007 May 1;23(9):1061-7 – reference: 12364791 - Science. 2002 Oct 4;298(5591):129-49 – reference: 11719806 - Nature. 2001 Nov 22;414(6862):450-3 – reference: 17440619 - PLoS One. 2007;2(4):e383 – reference: 14597658 - Genome Res. 2003 Nov;13(11):2498-504 |
| SSID | ssj0017805 |
| Score | 2.241759 |
| Snippet | Background
Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of... Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov... Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of... |
| SourceID | pubmedcentral proquest gale pubmed crossref springer |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 218 |
| SubjectTerms | Algorithms Amino Acid Sequence Bioinformatics Biomedical and Life Sciences Cluster Analysis Computational Biology - methods Computational Biology/Bioinformatics Computer Appl. in Life Sciences Databases, Factual Humans Life Sciences Markov Chains Microarrays Molecular Sequence Data Proteins - chemistry Proteins - metabolism Research Article Sequence Alignment - methods Sequence analysis (methods) Sequence Analysis, Protein - methods Sequence Homology, Amino Acid Software |
| Title | Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm |
| URI | https://link.springer.com/article/10.1186/s12859-015-0625-x https://www.ncbi.nlm.nih.gov/pubmed/26160651 https://www.proquest.com/docview/1695760725 https://pubmed.ncbi.nlm.nih.gov/PMC4496851 |
| Volume | 16 |
| WOSCitedRecordID | wos000357631800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVADU databaseName: BioMed Central customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RBZ dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.biomedcentral.com/search/ providerName: BioMedCentral – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M~E dateStart: 20000101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: AAdvanced Technologies & Aerospace Database (subscription) customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: P5Z dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Biological Science Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M7P dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/biologicalscijournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: K7- dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Health & Medical Collection (Proquest) customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: 7X7 dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: BENPR dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database (subscription) customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: PIMPY dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest – providerCode: PRVAVX databaseName: Springer Standard Collection customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RSV dateStart: 20001201 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELdgA4kXvj8CozIICQlkLU2c2Hnspk5MQBW1AxVeLCe1t0ptAk074L_nzkkqUgESvPjlzk3qnP27s8-_I-RFzH1pOIQlJrOacRFolmSJzxIDYMutncnYumITYjSS02mSNve4qzbbvT2SdCu1m9YyPqz6yLUGoW_EfHDaGTiO-4B2Eus1jCcft0cHSNLfHF_-tlsHgHaX4V9waDdHcueg1OHPya3_evPb5GbjbtJBbR93yBVT3CXX6wKUP-6Rr8Mt2TctLT16N5icMQS2GcV9NvbNbZzCs-gSK2_lFd2gDBxdelEu3Y_QeXtlkOKeLgWHkuIFoPKSHi82SMOA3fXivFyBeHmffDgZnh2_YU0RBpYD0q-ZzBIRJjaJTMZjyW0mk9yawAoTaPAFQ-FLm2krQUtoY8ADCUPTtxEXM1gbZiJ8QPaKsjCPCA0zHeSgCT5exqMMUAJiKZ-bMNSZ7seBR_z2y6i8YSjHQhkL5SIVGat6JBWMpMKRVN898mrb5UtNz_E35ef4uRXSXhSYV3OuN1WlTidjNYgAzCMkv_fIy0bJlvDwXDfXFOAvIFNWR_OgownzMu-In7VWpVCEyWyFKTeV6scJRHm-CCKPPKytbPvyENBCSBn1PSI69rdVQDrwrqSYXzhacM6TWGLP160VqmY9qv48Jo__SfsJuRE4MxYA3wdkb73amKfkWn65nlerHrkqpsK1skf2j4ajdNxzexzQvhWsh3m1KbRp9Bnk6en79FPPzd2flMg5xQ |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3rb9MwELfQALEvvGGBAQYhIYEs8nBi52OZNm2iVGgtaN8sO7W3Sm0CTTPgv-cuL5EKkODznWPHPvt358fvCHmZcF9aDmGJNU4zLkLNUpP6LLUAtty5uUxcnWxCTCby7Cz92L7jLrvb7t2RZL1S19NaJm_LALnWIPSNmQ9OOwPH8SoHwELC_NPp5_7oAEn62-PL3xYbAND2MvwLDm3fkdw6KK3x5-jWf7X8NrnZupt01NjHHXLF5nfJ9SYB5Y975OthT_ZNC0ffjUfTGUNgm1PcZ2Pf6o1TqIuuMPNWVtIKZeDo0otiVX-ELrongxT3dCk4lBQfABWX9GBZIQ0DFtfL82IN4tV98unocHZwzNokDCwDpN8waVIRpS6NreGJ5M7INHM2dMKGGnzBSPjSGe0kaAltLXggUWQDF3Mxh7VhLqIHZCcvcrtHaGR0mIEm-HiGxwZQAmIpn9so0kYHSegRvxsZlbUM5ZgoY6nqSEUmqulJBT2psCfVd4-87ot8aeg5_qb8AodbIe1FjvdqznVVlupkeqpGMYB5jOT3HnnVKrkCKs90-0wBfgGZsgaa-wNNmJfZQPy8syqFIrzMltuiKlWQpBDl-SKMPfKwsbK-8RDQQkgZBx4RA_vrFZAOfCjJFxc1LTjnaSKx5JvOClW7HpV_7pNH_6T9jNw4nn0Yq_HJ5P1jshvWJi0AyvfJzmZd2SfkWna5WZTrp_Wc_AkdejPU |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Zb9QwELZQOcRLuWGhgEFISFRWczix87iUrqioVhVbUN8sO7HblXaTdrMp8O-ZySWyAiTE84xz2GPPN_b4G0LexNyTlkNYYo3TjItAs8QkHkssOFvuXCZjVxebENOpPD1Njts6p2WX7d4dSTZ3GpClKV_vXWSumeIy3it95F2DMDhiHgB4BiDyOsc8egzXZ1_7YwQk7G-PMn_bbOCMNpfkX3zSZr7kxqFp7Ysmd_77L-6S7RaG0nFjN_fINZvfJzebwpQ_HpDLg54EnBaOvj8az04YOryM4v4b-1ZvqMJ76RIrcqUlrVAGAJieF8v6IXTeXSWkuNdLAWhSvBhUXNH9RYX0DNhcL86KFYiXD8mXycHJ_kfWFmdgKSCANZMmEWHiksgaHkvujExSZwMnbKABI4bCk85oJ0FLaGsBmYSh9V3ERQZrRibCR2QrL3L7hNDQ6CAFTcB-hkcGvAfEWB63YaiN9uNgRLxulFTaMpdjAY2FqiMYGaumJxX0pMKeVN9H5F3f5KKh7fib8msceoV0GDnm25zpqizV4eyzGkfg5CMkxR-Rt62SK-DlqW6vL8AvIIPWQHNnoAnzNR2IX3UWplCESW65LapS-XEC0Z8ngmhEHjcW1388BLoQakb-iIiBLfYKSBM-lOTz85ounPMklthyt7NI1a5T5Z_75Ok_ab8kt44_TNTR4fTTM3I7qC1agIffIVvrVWWfkxvp1Xperl7U0_MnyqA8uA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluation+of+BLAST-based+edge-weighting+metrics+used+for+homology+inference+with+the+Markov+Clustering+algorithm&rft.jtitle=BMC+bioinformatics&rft.au=Gibbons%2C+Theodore+R&rft.au=Mount%2C+Stephen+M&rft.au=Cooper%2C+Endymion+D&rft.au=Delwiche%2C+Charles+F&rft.date=2015-07-10&rft.pub=BioMed+Central+Ltd&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=16&rft_id=info:doi/10.1186%2Fs12859-015-0625-x&rft.externalDocID=A541358229 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon |