Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm

Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC bioinformatics Jg. 16; H. 1; S. 218
Hauptverfasser: Gibbons, Theodore R., Mount, Stephen M., Cooper, Endymion D., Delwiche, Charles F.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: London BioMed Central 10.07.2015
BioMed Central Ltd
Schlagworte:
ISSN:1471-2105, 1471-2105
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
AbstractList Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. Keywords: MCL, Protein clustering, Sequence clustering, Homology prediction, Graph, Genomics, Bioinformatics, Transcriptomics, Short-read sequencing, High-throughput sequencing
Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here.BACKGROUNDClustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here.All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios.RESULTSAll metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios.The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.CONCLUSIONSThe results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
ArticleNumber 218
Audience Academic
Author Mount, Stephen M.
Gibbons, Theodore R.
Cooper, Endymion D.
Delwiche, Charles F.
Author_xml – sequence: 1
  givenname: Theodore R.
  surname: Gibbons
  fullname: Gibbons, Theodore R.
  email: trgibbons@gmail.com
  organization: Department of Cell Biology and Molecular Genetics, University of Maryland
– sequence: 2
  givenname: Stephen M.
  surname: Mount
  fullname: Mount, Stephen M.
  organization: Department of Cell Biology and Molecular Genetics, University of Maryland, Center for Bioinformatics and Computational Biology, University of Maryland
– sequence: 3
  givenname: Endymion D.
  surname: Cooper
  fullname: Cooper, Endymion D.
  organization: Department of Cell Biology and Molecular Genetics, University of Maryland
– sequence: 4
  givenname: Charles F.
  surname: Delwiche
  fullname: Delwiche, Charles F.
  organization: Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland Agricultural Experiment Station, University of Maryland
BackLink https://www.ncbi.nlm.nih.gov/pubmed/26160651$$D View this record in MEDLINE/PubMed
BookMark eNp9kktv1DAUhSNURB_wA9ggS2xgkWIn8SMbpGHUlkqDkGhZW05ynXFJ4qntTKf_HoeUqoNQ5YUt3-8c-fqe4-RgsAMkyVuCTwkR7JMnmaBliglNMctounuRHJGCkzQjmB48OR8mx97fYEy4wPRVcpgxwjCj5Ci5PduqblTB2AFZjb6sFlfXaaU8NAiaFtI7MO06mKFFPQRnao_GqaatQ2vb286298gMGhwMNaA7E9YorAF9U-6X3aJlN_oAbpKrrrUulvvXyUutOg9vHvaT5Of52fXya7r6fnG5XKzSmpE8pKIqeV7qkkJVMFHoSpS1hkxzyBSjLOdY6EppESmuADCneQ5E04I3hRANz0-Sz7PvZqx6aGoYglOd3DjTK3cvrTJyvzKYtWztVhZFyQQl0eDDg4GztyP4IHvja-g6NYAdvSSspJxhntGIvp_RVnUg43_Y6FhPuFzQguRUZFkZqdP_UHE10Js6jlabeL8n-LgniEyAXWjV6L28vPqxz7572u5jn39HHQEyA7Wz3jvQjwjBcoqTnOMkY5zkFCe5ixr-j6Y24U9Y4stN96wym5V-M40fnLyxoxviwJ8R_QbfXN8V
CitedBy_id crossref_primary_10_1002_prot_26668
crossref_primary_10_1007_s10482_022_01750_8
crossref_primary_10_1093_gigascience_giz118
crossref_primary_10_1016_j_syapm_2018_07_005
crossref_primary_10_1093_nar_gkx977
crossref_primary_10_1128_msystems_00705_19
crossref_primary_10_1099_mgen_0_000174
crossref_primary_10_1186_s12859_019_2670_3
crossref_primary_10_1007_s11390_023_2835_4
crossref_primary_10_1371_journal_pcbi_1009341
crossref_primary_10_3389_fmicb_2023_1250602
crossref_primary_10_1186_s12859_015_0690_1
crossref_primary_10_1186_s12862_018_1142_0
Cites_doi 10.1101/gr.1239303
10.1371/journal.pone.0000383
10.1093/nar/30.7.1575
10.7717/peerj.332
10.1006/jmbi.2000.5197
10.1093/bioinformatics/btq655
10.1038/35057062
10.1101/gr.1224503
10.1038/35048692
10.1093/bioinformatics/btt582
10.1186/1471-2105-10-421
10.1016/j.compbiomed.2014.02.016
10.1186/1471-2105-6-2
10.1093/nar/gkg072
10.1073/pnas.95.11.6239
10.1126/science.1080049
10.1093/bioinformatics/btq675
10.1126/science.278.5338.631
10.1093/bioinformatics/btm071
10.1016/S0022-2836(05)80360-2
10.1093/nar/gkj515
10.1128/EC.2.6.1137-1150.2003
10.1126/science.282.5396.2012
10.1186/1471-2105-4-41
10.1038/nature724
10.1146/annurev.genet.39.073003.114725
10.1016/S0168-9525(02)02793-2
10.1126/science.287.5461.2185
10.1126/science.1076181
10.1038/35106579
ContentType Journal Article
Copyright Gibbons et al. 2015
COPYRIGHT 2015 BioMed Central Ltd.
Copyright_xml – notice: Gibbons et al. 2015
– notice: COPYRIGHT 2015 BioMed Central Ltd.
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
ISR
7X8
5PM
DOI 10.1186/s12859-015-0625-x
DatabaseName Springer Nature OA Free Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Science
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
MEDLINE



MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1471-2105
ExternalDocumentID PMC4496851
A541358229
26160651
10_1186_s12859_015_0625_x
Genre Research Support, U.S. Gov't, Non-P.H.S
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: T32 GM080201
GroupedDBID ---
0R~
23N
2WC
4.4
53G
5VS
6J9
7X7
88E
8AO
8FE
8FG
8FH
8FI
8FJ
AAFWJ
AAJSJ
AAKPC
AASML
ABDBF
ABUWG
ACGFO
ACGFS
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADMLS
ADRAZ
ADUKV
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHBYD
AHMBA
AHSBF
AHYZX
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIJS
ARAPS
AZQEC
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BGLVJ
BHPHI
BMC
BPHCQ
BVXVI
C6C
CCPQU
CS3
DIK
DU5
DWQXO
E3Z
EAD
EAP
EAS
EBD
EBLON
EBS
EJD
EMB
EMK
EMOBN
ESX
F5P
FYUFA
GNUQQ
GROUPED_DOAJ
GX1
H13
HCIFZ
HMCUK
HYE
IAO
ICD
IHR
INH
INR
ISR
ITC
K6V
K7-
KQ8
LK8
M1P
M48
M7P
MK~
ML0
M~E
O5R
O5S
OK1
OVT
P2P
P62
PGMZT
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PUEGO
RBZ
RNS
ROL
RPM
RSV
SBL
SOJ
SV3
TR2
TUS
UKHRP
W2D
WOQ
WOW
XH6
XSB
AAYXX
AFFHD
CITATION
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c613t-8b9739f95eb4684fb89cfe2f7e2a6563708fbaf89737aee07533e1f547d488d73
IEDL.DBID RSV
ISICitedReferencesCount 15
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000357631800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1471-2105
IngestDate Tue Nov 04 01:59:33 EST 2025
Thu Sep 04 20:03:28 EDT 2025
Tue Nov 11 10:16:53 EST 2025
Tue Nov 04 17:56:25 EST 2025
Thu Nov 13 14:17:47 EST 2025
Mon Jul 21 06:02:19 EDT 2025
Sat Nov 29 05:39:58 EST 2025
Tue Nov 18 22:35:12 EST 2025
Sat Sep 06 07:27:17 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Graph
High-throughput sequencing
Transcriptomics
Short-read sequencing
Protein clustering
Genomics
MCL
Homology prediction
Bioinformatics
Sequence clustering
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c613t-8b9739f95eb4684fb89cfe2f7e2a6563708fbaf89737aee07533e1f547d488d73
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://link.springer.com/10.1186/s12859-015-0625-x
PMID 26160651
PQID 1695760725
PQPubID 23479
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_4496851
proquest_miscellaneous_1695760725
gale_infotracmisc_A541358229
gale_infotracacademiconefile_A541358229
gale_incontextgauss_ISR_A541358229
pubmed_primary_26160651
crossref_primary_10_1186_s12859_015_0625_x
crossref_citationtrail_10_1186_s12859_015_0625_x
springer_journals_10_1186_s12859_015_0625_x
PublicationCentury 2000
PublicationDate 20150710
2015-7-10
2015-Jul-10
PublicationDateYYYYMMDD 2015-07-10
PublicationDate_xml – month: 7
  year: 2015
  text: 20150710
  day: 10
PublicationDecade 2010
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle BMC bioinformatics
PublicationTitleAbbrev BMC Bioinformatics
PublicationTitleAlternate BMC Bioinformatics
PublicationYear 2015
Publisher BioMed Central
BioMed Central Ltd
Publisher_xml – name: BioMed Central
– name: BioMed Central Ltd
References A Goffeau (625_CR16) 1996; 274
SF Altschul (625_CR1) 1990; 215
MC Rivera (625_CR3) 1998; 95
EV Koonin (625_CR7) 2005; 39
OK Ekseth (625_CR11) 2014; 30
The C. elegans Sequencing Consortium (625_CR17) 1998; 282
AR Grossman (625_CR25) 2003; 2
JW Sahl (625_CR13) 2014; 2
SM Szilágyi (625_CR27) 2014; 48
DA Rasko (625_CR12) 2005; 6
P Shannon (625_CR30) 2003; 13
ME Smoot (625_CR31) 2011; 27
625_CR6
625_CR5
P Dehal (625_CR24) 2002; 298
L Apeltsin (625_CR29) 2011; 27
625_CR18
MD Adams (625_CR19) 2000; 287
AJ Enright (625_CR9) 2002; 30
F Chen (625_CR28) 2007; 2
C Camacho (625_CR2) 2009; 10
G Parra (625_CR15) 2007; 23
A Paccanaro (625_CR4) 2006; 34
RL Tatusov (625_CR8) 1997; 278
RL Tatusov (625_CR14) 2003; 4
RA Holt (625_CR23) 2002; 298
V Wood (625_CR22) 2002; 415
L Li (625_CR10) 2003; 13
ES Lander (625_CR21) 2001; 409
MD Katinka (625_CR20) 2001; 414
625_CR26
11859360 - Nature. 2002 Feb 21;415(6874):871-80
2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10
21149340 - Bioinformatics. 2011 Feb 1;27(3):431-2
24657908 - Comput Biol Med. 2014 May;48:94-101
16547200 - Nucleic Acids Res. 2006;34(5):1571-80
12364791 - Science. 2002 Oct 4;298(5591):129-49
11917018 - Nucleic Acids Res. 2002 Apr 1;30(7):1575-84
11130711 - Nature. 2000 Dec 14;408(6814):796-815
16285863 - Annu Rev Genet. 2005;39:309-38
12481130 - Science. 2002 Dec 13;298(5601):2157-67
14665449 - Eukaryot Cell. 2003 Dec;2(6):1137-50
12969510 - BMC Bioinformatics. 2003 Sep 11;4:41
9381173 - Science. 1997 Oct 24;278(5338):631-7
12519989 - Nucleic Acids Res. 2003 Jan 1;31(1):234-6
15634352 - BMC Bioinformatics. 2005;6:2
24115168 - Bioinformatics. 2014 Mar 1;30(5):734-6
9851916 - Science. 1998 Dec 11;282(5396):2012-8
24749011 - PeerJ. 2014 Apr 01;2:e332
9600949 - Proc Natl Acad Sci U S A. 1998 May 26;95(11):6239-44
17332020 - Bioinformatics. 2007 May 1;23(9):1061-7
10731132 - Science. 2000 Mar 24;287(5461):2185-95
12446146 - Trends Genet. 2002 Dec;18(12):619-20
11237011 - Nature. 2001 Feb 15;409(6822):860-921
11719806 - Nature. 2001 Nov 22;414(6862):450-3
17440619 - PLoS One. 2007;2(4):e383
12952885 - Genome Res. 2003 Sep;13(9):2178-89
20003500 - BMC Bioinformatics. 2009;10:421
21118823 - Bioinformatics. 2011 Feb 1;27(3):326-33
8849441 - Science. 1996 Oct 25;274(5287):546, 563-7
11743721 - J Mol Biol. 2001 Dec 14;314(5):1041-52
26315999 - BMC Bioinformatics. 2015;16:274
14597658 - Genome Res. 2003 Nov;13(11):2498-504
References_xml – volume: 13
  start-page: 2498
  year: 2003
  ident: 625_CR30
  publication-title: Genome Res
  doi: 10.1101/gr.1239303
– volume: 2
  year: 2007
  ident: 625_CR28
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0000383
– volume: 30
  start-page: 1575
  year: 2002
  ident: 625_CR9
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/30.7.1575
– volume: 2
  year: 2014
  ident: 625_CR13
  publication-title: PeerJ
  doi: 10.7717/peerj.332
– ident: 625_CR5
  doi: 10.1006/jmbi.2000.5197
– volume: 27
  start-page: 326
  year: 2011
  ident: 625_CR29
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq655
– volume: 409
  start-page: 860
  year: 2001
  ident: 625_CR21
  publication-title: Nature
  doi: 10.1038/35057062
– volume: 13
  start-page: 2178
  year: 2003
  ident: 625_CR10
  publication-title: Genome Res
  doi: 10.1101/gr.1224503
– ident: 625_CR18
  doi: 10.1038/35048692
– volume: 30
  start-page: 734
  year: 2014
  ident: 625_CR11
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt582
– volume: 10
  start-page: 421
  year: 2009
  ident: 625_CR2
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-10-421
– volume: 48
  start-page: 94
  year: 2014
  ident: 625_CR27
  publication-title: Comput Biol Med
  doi: 10.1016/j.compbiomed.2014.02.016
– volume: 6
  start-page: 2
  year: 2005
  ident: 625_CR12
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-6-2
– ident: 625_CR26
  doi: 10.1093/nar/gkg072
– volume: 95
  start-page: 6239
  year: 1998
  ident: 625_CR3
  publication-title: Proc Natl Acad Sci U S A
  doi: 10.1073/pnas.95.11.6239
– volume: 298
  start-page: 2157
  year: 2002
  ident: 625_CR24
  publication-title: Science
  doi: 10.1126/science.1080049
– volume: 27
  start-page: 431
  year: 2011
  ident: 625_CR31
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq675
– volume: 278
  start-page: 631
  year: 1997
  ident: 625_CR8
  publication-title: Science
  doi: 10.1126/science.278.5338.631
– volume: 274
  start-page: 563
  issue: 546
  year: 1996
  ident: 625_CR16
  publication-title: Science
– volume: 23
  start-page: 1061
  year: 2007
  ident: 625_CR15
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm071
– volume: 215
  start-page: 403
  year: 1990
  ident: 625_CR1
  publication-title: J Mol Biol
  doi: 10.1016/S0022-2836(05)80360-2
– volume: 34
  start-page: 1571
  year: 2006
  ident: 625_CR4
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkj515
– volume: 2
  start-page: 1137
  year: 2003
  ident: 625_CR25
  publication-title: Eukaryot Cell
  doi: 10.1128/EC.2.6.1137-1150.2003
– volume: 282
  start-page: 2012
  year: 1998
  ident: 625_CR17
  publication-title: Science
  doi: 10.1126/science.282.5396.2012
– volume: 4
  start-page: 41
  year: 2003
  ident: 625_CR14
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-4-41
– volume: 415
  start-page: 871
  year: 2002
  ident: 625_CR22
  publication-title: Nature
  doi: 10.1038/nature724
– volume: 39
  start-page: 309
  year: 2005
  ident: 625_CR7
  publication-title: Annu Rev Genet
  doi: 10.1146/annurev.genet.39.073003.114725
– ident: 625_CR6
  doi: 10.1016/S0168-9525(02)02793-2
– volume: 287
  start-page: 2185
  year: 2000
  ident: 625_CR19
  publication-title: Science
  doi: 10.1126/science.287.5461.2185
– volume: 298
  start-page: 129
  year: 2002
  ident: 625_CR23
  publication-title: Science
  doi: 10.1126/science.1076181
– volume: 414
  start-page: 450
  year: 2001
  ident: 625_CR20
  publication-title: Nature
  doi: 10.1038/35106579
– reference: 9600949 - Proc Natl Acad Sci U S A. 1998 May 26;95(11):6239-44
– reference: 21118823 - Bioinformatics. 2011 Feb 1;27(3):326-33
– reference: 8849441 - Science. 1996 Oct 25;274(5287):546, 563-7
– reference: 21149340 - Bioinformatics. 2011 Feb 1;27(3):431-2
– reference: 10731132 - Science. 2000 Mar 24;287(5461):2185-95
– reference: 11237011 - Nature. 2001 Feb 15;409(6822):860-921
– reference: 26315999 - BMC Bioinformatics. 2015;16:274
– reference: 12446146 - Trends Genet. 2002 Dec;18(12):619-20
– reference: 24749011 - PeerJ. 2014 Apr 01;2:e332
– reference: 20003500 - BMC Bioinformatics. 2009;10:421
– reference: 16547200 - Nucleic Acids Res. 2006;34(5):1571-80
– reference: 12969510 - BMC Bioinformatics. 2003 Sep 11;4:41
– reference: 11859360 - Nature. 2002 Feb 21;415(6874):871-80
– reference: 24115168 - Bioinformatics. 2014 Mar 1;30(5):734-6
– reference: 11743721 - J Mol Biol. 2001 Dec 14;314(5):1041-52
– reference: 9381173 - Science. 1997 Oct 24;278(5338):631-7
– reference: 11917018 - Nucleic Acids Res. 2002 Apr 1;30(7):1575-84
– reference: 12519989 - Nucleic Acids Res. 2003 Jan 1;31(1):234-6
– reference: 12481130 - Science. 2002 Dec 13;298(5601):2157-67
– reference: 16285863 - Annu Rev Genet. 2005;39:309-38
– reference: 15634352 - BMC Bioinformatics. 2005;6:2
– reference: 2231712 - J Mol Biol. 1990 Oct 5;215(3):403-10
– reference: 11130711 - Nature. 2000 Dec 14;408(6814):796-815
– reference: 24657908 - Comput Biol Med. 2014 May;48:94-101
– reference: 9851916 - Science. 1998 Dec 11;282(5396):2012-8
– reference: 14665449 - Eukaryot Cell. 2003 Dec;2(6):1137-50
– reference: 12952885 - Genome Res. 2003 Sep;13(9):2178-89
– reference: 17332020 - Bioinformatics. 2007 May 1;23(9):1061-7
– reference: 12364791 - Science. 2002 Oct 4;298(5591):129-49
– reference: 11719806 - Nature. 2001 Nov 22;414(6862):450-3
– reference: 17440619 - PLoS One. 2007;2(4):e383
– reference: 14597658 - Genome Res. 2003 Nov;13(11):2498-504
SSID ssj0017805
Score 2.241759
Snippet Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of...
Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov...
Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of...
SourceID pubmedcentral
proquest
gale
pubmed
crossref
springer
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 218
SubjectTerms Algorithms
Amino Acid Sequence
Bioinformatics
Biomedical and Life Sciences
Cluster Analysis
Computational Biology - methods
Computational Biology/Bioinformatics
Computer Appl. in Life Sciences
Databases, Factual
Humans
Life Sciences
Markov Chains
Microarrays
Molecular Sequence Data
Proteins - chemistry
Proteins - metabolism
Research Article
Sequence Alignment - methods
Sequence analysis (methods)
Sequence Analysis, Protein - methods
Sequence Homology, Amino Acid
Software
Title Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm
URI https://link.springer.com/article/10.1186/s12859-015-0625-x
https://www.ncbi.nlm.nih.gov/pubmed/26160651
https://www.proquest.com/docview/1695760725
https://pubmed.ncbi.nlm.nih.gov/PMC4496851
Volume 16
WOSCitedRecordID wos000357631800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVADU
  databaseName: BioMed Central
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RBZ
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.biomedcentral.com/search/
  providerName: BioMedCentral
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: DOA
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M~E
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: AAdvanced Technologies & Aerospace Database (subscription)
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: P5Z
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Biological Science Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M7P
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/biologicalscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: K7-
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Health & Medical Collection (Proquest)
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: 7X7
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: BENPR
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database (subscription)
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: PIMPY
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: Springer Standard Collection
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RSV
  dateStart: 20001201
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELdgA4kXvj8CozIICQlkLU2c2Hnspk5MQBW1AxVeLCe1t0ptAk074L_nzkkqUgESvPjlzk3qnP27s8-_I-RFzH1pOIQlJrOacRFolmSJzxIDYMutncnYumITYjSS02mSNve4qzbbvT2SdCu1m9YyPqz6yLUGoW_EfHDaGTiO-4B2Eus1jCcft0cHSNLfHF_-tlsHgHaX4V9waDdHcueg1OHPya3_evPb5GbjbtJBbR93yBVT3CXX6wKUP-6Rr8Mt2TctLT16N5icMQS2GcV9NvbNbZzCs-gSK2_lFd2gDBxdelEu3Y_QeXtlkOKeLgWHkuIFoPKSHi82SMOA3fXivFyBeHmffDgZnh2_YU0RBpYD0q-ZzBIRJjaJTMZjyW0mk9yawAoTaPAFQ-FLm2krQUtoY8ADCUPTtxEXM1gbZiJ8QPaKsjCPCA0zHeSgCT5exqMMUAJiKZ-bMNSZ7seBR_z2y6i8YSjHQhkL5SIVGat6JBWMpMKRVN898mrb5UtNz_E35ef4uRXSXhSYV3OuN1WlTidjNYgAzCMkv_fIy0bJlvDwXDfXFOAvIFNWR_OgownzMu-In7VWpVCEyWyFKTeV6scJRHm-CCKPPKytbPvyENBCSBn1PSI69rdVQDrwrqSYXzhacM6TWGLP160VqmY9qv48Jo__SfsJuRE4MxYA3wdkb73amKfkWn65nlerHrkqpsK1skf2j4ajdNxzexzQvhWsh3m1KbRp9Bnk6en79FPPzd2flMg5xQ
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3rb9MwELfQALEvvGGBAQYhIYEs8nBi52OZNm2iVGgtaN8sO7W3Sm0CTTPgv-cuL5EKkODznWPHPvt358fvCHmZcF9aDmGJNU4zLkLNUpP6LLUAtty5uUxcnWxCTCby7Cz92L7jLrvb7t2RZL1S19NaJm_LALnWIPSNmQ9OOwPH8SoHwELC_NPp5_7oAEn62-PL3xYbAND2MvwLDm3fkdw6KK3x5-jWf7X8NrnZupt01NjHHXLF5nfJ9SYB5Y975OthT_ZNC0ffjUfTGUNgm1PcZ2Pf6o1TqIuuMPNWVtIKZeDo0otiVX-ELrongxT3dCk4lBQfABWX9GBZIQ0DFtfL82IN4tV98unocHZwzNokDCwDpN8waVIRpS6NreGJ5M7INHM2dMKGGnzBSPjSGe0kaAltLXggUWQDF3Mxh7VhLqIHZCcvcrtHaGR0mIEm-HiGxwZQAmIpn9so0kYHSegRvxsZlbUM5ZgoY6nqSEUmqulJBT2psCfVd4-87ot8aeg5_qb8AodbIe1FjvdqznVVlupkeqpGMYB5jOT3HnnVKrkCKs90-0wBfgGZsgaa-wNNmJfZQPy8syqFIrzMltuiKlWQpBDl-SKMPfKwsbK-8RDQQkgZBx4RA_vrFZAOfCjJFxc1LTjnaSKx5JvOClW7HpV_7pNH_6T9jNw4nn0Yq_HJ5P1jshvWJi0AyvfJzmZd2SfkWna5WZTrp_Wc_AkdejPU
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Zb9QwELZQOcRLuWGhgEFISFRWczix87iUrqioVhVbUN8sO7HblXaTdrMp8O-ZySWyAiTE84xz2GPPN_b4G0LexNyTlkNYYo3TjItAs8QkHkssOFvuXCZjVxebENOpPD1Njts6p2WX7d4dSTZ3GpClKV_vXWSumeIy3it95F2DMDhiHgB4BiDyOsc8egzXZ1_7YwQk7G-PMn_bbOCMNpfkX3zSZr7kxqFp7Ysmd_77L-6S7RaG0nFjN_fINZvfJzebwpQ_HpDLg54EnBaOvj8az04YOryM4v4b-1ZvqMJ76RIrcqUlrVAGAJieF8v6IXTeXSWkuNdLAWhSvBhUXNH9RYX0DNhcL86KFYiXD8mXycHJ_kfWFmdgKSCANZMmEWHiksgaHkvujExSZwMnbKABI4bCk85oJ0FLaGsBmYSh9V3ERQZrRibCR2QrL3L7hNDQ6CAFTcB-hkcGvAfEWB63YaiN9uNgRLxulFTaMpdjAY2FqiMYGaumJxX0pMKeVN9H5F3f5KKh7fib8msceoV0GDnm25zpqizV4eyzGkfg5CMkxR-Rt62SK-DlqW6vL8AvIIPWQHNnoAnzNR2IX3UWplCESW65LapS-XEC0Z8ngmhEHjcW1388BLoQakb-iIiBLfYKSBM-lOTz85ounPMklthyt7NI1a5T5Z_75Ok_ab8kt44_TNTR4fTTM3I7qC1agIffIVvrVWWfkxvp1Xperl7U0_MnyqA8uA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluation+of+BLAST-based+edge-weighting+metrics+used+for+homology+inference+with+the+Markov+Clustering+algorithm&rft.jtitle=BMC+bioinformatics&rft.au=Gibbons%2C+Theodore+R&rft.au=Mount%2C+Stephen+M&rft.au=Cooper%2C+Endymion+D&rft.au=Delwiche%2C+Charles+F&rft.date=2015-07-10&rft.pub=BioMed+Central+Ltd&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=16&rft_id=info:doi/10.1186%2Fs12859-015-0625-x&rft.externalDocID=A541358229
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon