Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms

Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics m...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:BMC bioinformatics Ročník 16; číslo 1; s. 68
Hlavní autori: Lord, Etienne, Diallo, Abdoulaye Baniré, Makarenkov, Vladimir
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: London BioMed Central 03.03.2015
BioMed Central Ltd
Predmet:
ISSN:1471-2105, 1471-2105
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. Results In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes ( e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k -means and k -medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Conclusions Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k- medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support.
AbstractList Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support.
BACKGROUNDWorkflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search.RESULTSIn this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced.CONCLUSIONSOur findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support.
Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support.
Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. Results In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Conclusions Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support. Keywords: Bioinformatics workflows, Hierarchical clustering, k-means partitioning, Scientific workflows, Workflow clustering
Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. Results In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes ( e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k -means and k -medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Conclusions Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k- medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support.
ArticleNumber 68
Audience Academic
Author Lord, Etienne
Makarenkov, Vladimir
Diallo, Abdoulaye Baniré
Author_xml – sequence: 1
  givenname: Etienne
  surname: Lord
  fullname: Lord, Etienne
  organization: Département d’informatique, Université du Québec à Montréal, Département de sciences biologiques, Université à Montréal
– sequence: 2
  givenname: Abdoulaye Baniré
  surname: Diallo
  fullname: Diallo, Abdoulaye Baniré
  organization: Département d’informatique, Université du Québec à Montréal
– sequence: 3
  givenname: Vladimir
  surname: Makarenkov
  fullname: Makarenkov, Vladimir
  email: makarenkov.vladimir@uqam.ca
  organization: Département d’informatique, Université du Québec à Montréal
BackLink https://www.ncbi.nlm.nih.gov/pubmed/25887434$$D View this record in MEDLINE/PubMed
BookMark eNp9ks1q3DAUhU1JaX7aB-imGLppFk4l27LkTSEMTRsIFPqzFrJ85VEqW1NdOdO8feVMUjKlBC9spO87SD73ODuY_ARZ9pqSM0pF8x5pKVhbEMoKwogo6LPsiNacFiUl7ODR92F2jHhNCOWCsBfZYcmE4HVVH2W3K6cQrbFaReun3Ju8s95OxocxrWjMtz78NM5vMZ_RTkO-BTusI_T5DQRMCi7ORoVol4CFUFOfry0EFfQ65bpcuxkjhLs9N_hg43rEl9lzoxzCq_v3Sfbj4uP31efi6suny9X5VaEb0sbCcA2s73ndKaJLA6Y0gtOW6rY1WpGOsa4x0POO0qZpgVcdr42oSmVo32reVifZh13uZu5G6DVMMSgnN8GOKtxKr6zc35nsWg7-RtYVq3lTpYB39wHB_5oBoxwtanBOTeBnlLThdSME4wv6docOyoFc_mJK1Asuz1lNKyZIQxJ19h8qPT2MVqeKjU3re8LpnpCYCL_joGZEefnt6z775vF1_97zofIE8B2gg0cMYKS28a78dArrJCVyGS65Gy6ZhksuwyVpMuk_5kP4U065c3Cz9A9BXvs5TKnwJ6Q_YjjkJQ
CitedBy_id crossref_primary_10_1016_j_commatsci_2025_114019
crossref_primary_10_1016_j_culher_2016_03_008
crossref_primary_10_1016_j_heliyon_2024_e41488
crossref_primary_10_1016_j_ins_2017_02_010
crossref_primary_10_1038_s41598_018_29732_9
crossref_primary_10_1061__ASCE_EI_1943_5541_0000325
crossref_primary_10_2196_30890
crossref_primary_10_1186_s12575_018_0067_8
crossref_primary_10_3389_fmed_2025_1503229
crossref_primary_10_1007_s12633_025_03338_z
Cites_doi 10.1093/nar/gks485
10.1007/978-3-540-73560-1_15
10.1093/nar/gkq429
10.1101/gr.4086505
10.1007/978-3-540-28651-6_25
10.1093/molbev/msr121
10.1007/11751588_40
10.1109/CCGrid.2012.109
10.1016/j.jmva.2007.07.002
10.1007/978-3-540-89965-5_18
10.1109/SCC.2010.95
10.1016/j.cosrev.2007.05.001
10.1111/j.1558-5646.1984.tb00255.x
10.1016/0025-5564(81)90043-2
10.1007/978-1-84628-757-2_19
10.1007/s00357-001-0018-x
10.1016/j.csda.2006.11.025
10.1109/MS.2008.92
10.1016/0377-0427(87)90125-7
10.1002/9780470316801
10.1109/ICALT.2001.943942
10.1016/j.csda.2011.09.003
10.1145/1341811.1341822
10.1016/j.drudis.2008.06.005
10.1080/01621459.1971.10482356
10.1109/SERVICES.2012.15
10.7155/jgaa.00139
10.1109/TSC.2010.6
10.1002/cpe.3003
10.1007/BF02294245
10.1093/biomet/asq061
10.1080/03610928308827180
10.1007/BF01246105
10.1126/science.155.3760.279
10.1348/000711007X184849
10.1371/journal.pone.0029903
10.1016/j.patcog.2012.07.021
10.1109/WORKS.2008.4723958
ContentType Journal Article
Copyright Lord et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated.
COPYRIGHT 2015 BioMed Central Ltd.
Lord et al.; licensee BioMed Central. 2015
Copyright_xml – notice: Lord et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated.
– notice: COPYRIGHT 2015 BioMed Central Ltd.
– notice: Lord et al.; licensee BioMed Central. 2015
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
ISR
7X8
5PM
DOI 10.1186/s12859-015-0508-1
DatabaseName Springer Nature OA Free Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Science
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
MEDLINE



Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1471-2105
EndPage 68
ExternalDocumentID PMC4354763
A541358060
25887434
10_1186_s12859_015_0508_1
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID ---
0R~
23N
2WC
4.4
53G
5VS
6J9
7X7
88E
8AO
8FE
8FG
8FH
8FI
8FJ
AAFWJ
AAJSJ
AAKPC
AASML
ABDBF
ABUWG
ACGFO
ACGFS
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADMLS
ADRAZ
ADUKV
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHBYD
AHMBA
AHSBF
AHYZX
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIJS
ARAPS
AZQEC
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BGLVJ
BHPHI
BMC
BPHCQ
BVXVI
C6C
CCPQU
CS3
DIK
DU5
DWQXO
E3Z
EAD
EAP
EAS
EBD
EBLON
EBS
EJD
EMB
EMK
EMOBN
ESX
F5P
FYUFA
GNUQQ
GROUPED_DOAJ
GX1
H13
HCIFZ
HMCUK
HYE
IAO
ICD
IHR
INH
INR
ISR
ITC
K6V
K7-
KQ8
LK8
M1P
M48
M7P
MK~
ML0
M~E
O5R
O5S
OK1
OVT
P2P
P62
PGMZT
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PUEGO
RBZ
RNS
ROL
RPM
RSV
SBL
SOJ
SV3
TR2
TUS
UKHRP
W2D
WOQ
WOW
XH6
XSB
AAYXX
AFFHD
CITATION
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c609t-f7ce5dd74ba0c2fef2f87191c99fca0b55b6fed7b11669e73b74f832af1d9c793
IEDL.DBID RSV
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000350619800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1471-2105
IngestDate Tue Nov 04 01:45:51 EST 2025
Fri Sep 05 06:55:53 EDT 2025
Tue Nov 11 11:02:36 EST 2025
Tue Nov 04 18:20:29 EST 2025
Thu Nov 13 16:40:19 EST 2025
Mon Jul 21 06:06:01 EDT 2025
Sat Nov 29 07:54:55 EST 2025
Tue Nov 18 22:35:09 EST 2025
Sat Sep 06 07:27:17 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords means partitioning
Scientific workflows
Bioinformatics workflows
Hierarchical clustering
Workflow clustering
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c609t-f7ce5dd74ba0c2fef2f87191c99fca0b55b6fed7b11669e73b74f832af1d9c793
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://link.springer.com/10.1186/s12859-015-0508-1
PMID 25887434
PQID 1674688573
PQPubID 23479
PageCount 1
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_4354763
proquest_miscellaneous_1674688573
gale_infotracmisc_A541358060
gale_infotracacademiconefile_A541358060
gale_incontextgauss_ISR_A541358060
pubmed_primary_25887434
crossref_citationtrail_10_1186_s12859_015_0508_1
crossref_primary_10_1186_s12859_015_0508_1
springer_journals_10_1186_s12859_015_0508_1
PublicationCentury 2000
PublicationDate 20150303
2015-03-03
2015-Mar-03
PublicationDateYYYYMMDD 2015-03-03
PublicationDate_xml – month: 3
  year: 2015
  text: 20150303
  day: 3
PublicationDecade 2010
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle BMC bioinformatics
PublicationTitleAbbrev BMC Bioinformatics
PublicationTitleAlternate BMC Bioinformatics
PublicationYear 2015
Publisher BioMed Central
BioMed Central Ltd
Publisher_xml – name: BioMed Central
– name: BioMed Central Ltd
References M Kastner (508_CR23) 2009
J MacQueen (508_CR15) 1967
A Wombacher (508_CR22) 2006
T Caliński (508_CR17) 1974; 3
C Hennig (508_CR45) 2008; 99
Y Fang (508_CR49) 2012; 56
D Steinley (508_CR47) 2008; 61
WM Rand (508_CR40) 1971; 66
JY Jung (508_CR24) 2006
508_CR38
PJ Rousseeuw (508_CR19) 1987; 20
YL Tsai (508_CR12) 2012
J Wang (508_CR48) 2010; 97
D Grigori (508_CR9) 2010; 3
AP Reynolds (508_CR29) 2004
508_CR1
GW Milligan (508_CR46) 1996; 13
508_CR3
V Silva (508_CR27) 2011; 2
508_CR4
M Rahman (508_CR10) 2013; 25
JA Hartigan (508_CR18) 1975
DF Robinson (508_CR41) 1981; 53
D Woollard (508_CR5) 2008; 25
V Makarenkov (508_CR30) 2001; 18
N Saitou (508_CR34) 1987; 4
A Wombacher (508_CR21) 2010
K Tamura (508_CR43) 2011; 28
W Chen (508_CR14) 2013
R Sokal (508_CR33) 1958; 38
C Hennig (508_CR44) 2007; 52
B Giardine (508_CR2) 2005; 15
508_CR13
O Arbelaitz (508_CR32) 2013; 46
508_CR11
LR Kaufman (508_CR16) 1990
CA Goble (508_CR39) 2010; 38
D Conte (508_CR25) 2007; 11
HH Bock (508_CR28) 2007
J Felsenstein (508_CR37) 2004
WM Fitch (508_CR35) 1967; 155
J Felsenstein (508_CR36) 1989; 5
F Costa (508_CR6) 2012
E Santos (508_CR20) 2008
E Lord (508_CR8) 2012; 7
A Boc (508_CR42) 2012; 40
SA Beaulah (508_CR7) 2008; 13
SE Schaeffer (508_CR26) 2007; 1
GW Milligan (508_CR31) 1985; 50
References_xml – volume: 5
  start-page: 164
  year: 1989
  ident: 508_CR36
  publication-title: Cladistics
– start-page: 104
  volume-title: Third International Workshop on Resource Discovery - RED 2010: 5 November 2010; Paris
  year: 2012
  ident: 508_CR6
– volume: 40
  start-page: W573
  issue: W1
  year: 2012
  ident: 508_CR42
  publication-title: Nucl Acids Res
  doi: 10.1093/nar/gks485
– start-page: 161
  volume-title: Selected Contributions in Data Analysis and Classification
  year: 2007
  ident: 508_CR28
  doi: 10.1007/978-3-540-73560-1_15
– volume: 38
  start-page: W677
  issue: suppl 2
  year: 2010
  ident: 508_CR39
  publication-title: Nucl Acids Res
  doi: 10.1093/nar/gkq429
– volume: 15
  start-page: 1451
  year: 2005
  ident: 508_CR2
  publication-title: Genome Res
  doi: 10.1101/gr.4086505
– start-page: 173
  volume-title: Proceedings of the 5th International Conference on Intelligent Data Engineering and Automated Learning–IDEAL 2004: 25-27 August 2004; Exeter, UK
  year: 2004
  ident: 508_CR29
  doi: 10.1007/978-3-540-28651-6_25
– volume: 28
  start-page: 2731
  year: 2011
  ident: 508_CR43
  publication-title: Mol Biol Evol
  doi: 10.1093/molbev/msr121
– start-page: 188
  volume-title: 9th International Conference on eScience: 22-25 October 2013; Beijing
  year: 2013
  ident: 508_CR14
– start-page: 379
  volume-title: Proceedings of the International Conference on Computational Science and Its Applications - ICCSA 2006, Part II: 8-11 May 2006; Glasgow, UK
  year: 2006
  ident: 508_CR24
  doi: 10.1007/11751588_40
– ident: 508_CR11
  doi: 10.1109/CCGrid.2012.109
– volume: 99
  start-page: 1154
  year: 2008
  ident: 508_CR45
  publication-title: J Multivar Anal
  doi: 10.1016/j.jmva.2007.07.002
– start-page: 160
  volume-title: Provenance and Annotation of Data and Processes, Second International Provenance and Annotation Workshop: 17-18 June 2008; Salt Lake City
  year: 2008
  ident: 508_CR20
  doi: 10.1007/978-3-540-89965-5_18
– start-page: 337
  volume-title: Proceedinds of the IEEE International Conference on Services Computing (SCC): 5-10 July 2010; Miami
  year: 2010
  ident: 508_CR21
  doi: 10.1109/SCC.2010.95
– volume: 1
  start-page: 27
  year: 2007
  ident: 508_CR26
  publication-title: Comp Sci Rev
  doi: 10.1016/j.cosrev.2007.05.001
– ident: 508_CR38
  doi: 10.1111/j.1558-5646.1984.tb00255.x
– volume: 53
  start-page: 131
  year: 1981
  ident: 508_CR41
  publication-title: Math Biosc
  doi: 10.1016/0025-5564(81)90043-2
– ident: 508_CR4
  doi: 10.1007/978-1-84628-757-2_19
– volume: 18
  start-page: 245
  year: 2001
  ident: 508_CR30
  publication-title: J Classif
  doi: 10.1007/s00357-001-0018-x
– volume: 4
  start-page: 406
  year: 1987
  ident: 508_CR34
  publication-title: Mol Biol Evol
– volume-title: Inferring phylogenies
  year: 2004
  ident: 508_CR37
– volume: 52
  start-page: 258
  year: 2007
  ident: 508_CR44
  publication-title: Comput Stat Data Anal
  doi: 10.1016/j.csda.2006.11.025
– volume: 25
  start-page: 37
  year: 2008
  ident: 508_CR5
  publication-title: Software
  doi: 10.1109/MS.2008.92
– volume: 20
  start-page: 53
  year: 1987
  ident: 508_CR19
  publication-title: J Comp Appl Math
  doi: 10.1016/0377-0427(87)90125-7
– volume-title: Finding groups in data: An introduction to cluster analysis
  year: 1990
  ident: 508_CR16
  doi: 10.1002/9780470316801
– ident: 508_CR1
  doi: 10.1109/ICALT.2001.943942
– volume: 56
  start-page: 468
  year: 2012
  ident: 508_CR49
  publication-title: Comput Stat Data Anal
  doi: 10.1016/j.csda.2011.09.003
– ident: 508_CR13
  doi: 10.1145/1341811.1341822
– volume: 13
  start-page: 771
  year: 2008
  ident: 508_CR7
  publication-title: Drug Discov Today
  doi: 10.1016/j.drudis.2008.06.005
– volume: 38
  start-page: 1409
  year: 1958
  ident: 508_CR33
  publication-title: Univ Kansas Sci Bull
– volume: 66
  start-page: 846
  year: 1971
  ident: 508_CR40
  publication-title: J Am Stat Assoc
  doi: 10.1080/01621459.1971.10482356
– start-page: 737
  volume-title: 12th International Conference on Computer Aided Systems Theory - EUROCAST 2000: 15-20 February 2009; Las Palmas de Gran Canaria
  year: 2009
  ident: 508_CR23
– start-page: 1
  volume-title: IEEE Eighth World Congress on Services: 24-29 June 2012; Honolulu, HI
  year: 2012
  ident: 508_CR12
  doi: 10.1109/SERVICES.2012.15
– volume: 11
  start-page: 99
  year: 2007
  ident: 508_CR25
  publication-title: J Graph Algorithms Appl
  doi: 10.7155/jgaa.00139
– volume: 3
  start-page: 178
  year: 2010
  ident: 508_CR9
  publication-title: IEEE T Serv Comput
  doi: 10.1109/TSC.2010.6
– start-page: 255
  volume-title: Proceedings of the OTM Confederated International Conferences, CoopIS, DOA, GADA, and ODBASE 2006, Part I: 29 October to 3 November 2006; Montpellier
  year: 2006
  ident: 508_CR22
– start-page: 281
  volume-title: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: 21 June to 18 July 18, 1965 and 27 December 1965 to 7 January 1966, Berkeley
  year: 1967
  ident: 508_CR15
– volume: 25
  start-page: 1816
  year: 2013
  ident: 508_CR10
  publication-title: Concurr Comp-Pract E
  doi: 10.1002/cpe.3003
– volume: 50
  start-page: 159
  year: 1985
  ident: 508_CR31
  publication-title: Psychometrika
  doi: 10.1007/BF02294245
– volume-title: Clustering algorithms
  year: 1975
  ident: 508_CR18
– volume: 97
  start-page: 893
  year: 2010
  ident: 508_CR48
  publication-title: Biometrika
  doi: 10.1093/biomet/asq061
– volume: 3
  start-page: 1
  year: 1974
  ident: 508_CR17
  publication-title: Commun Stat Theory
  doi: 10.1080/03610928308827180
– volume: 13
  start-page: 315
  year: 1996
  ident: 508_CR46
  publication-title: J Classif
  doi: 10.1007/BF01246105
– volume: 155
  start-page: 279
  year: 1967
  ident: 508_CR35
  publication-title: Science
  doi: 10.1126/science.155.3760.279
– volume: 61
  start-page: 255
  year: 2008
  ident: 508_CR47
  publication-title: Br J Math Stat Psych
  doi: 10.1348/000711007X184849
– volume: 2
  start-page: 23
  year: 2011
  ident: 508_CR27
  publication-title: J Comp Interdisc Sci
– volume: 7
  start-page: e29903
  year: 2012
  ident: 508_CR8
  publication-title: PloS One
  doi: 10.1371/journal.pone.0029903
– volume: 46
  start-page: 243
  year: 2013
  ident: 508_CR32
  publication-title: Pattern Recogn
  doi: 10.1016/j.patcog.2012.07.021
– ident: 508_CR3
  doi: 10.1109/WORKS.2008.4723958
SSID ssj0017805
Score 2.2239099
Snippet Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific...
Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields,...
Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific...
BACKGROUNDWorkflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific...
SourceID pubmedcentral
proquest
gale
pubmed
crossref
springer
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 68
SubjectTerms Algorithms
Bioinformatics
Biomedical and Life Sciences
Cluster Analysis
Computational Biology - methods
Computational Biology/Bioinformatics
Computer Appl. in Life Sciences
Datasets as Topic
Life Sciences
Microarrays
Networks analysis
Phylogeny
Research Article
Software
Workflow
Title Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms
URI https://link.springer.com/article/10.1186/s12859-015-0508-1
https://www.ncbi.nlm.nih.gov/pubmed/25887434
https://www.proquest.com/docview/1674688573
https://pubmed.ncbi.nlm.nih.gov/PMC4354763
Volume 16
WOSCitedRecordID wos000350619800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVADU
  databaseName: BioMedCentral
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RBZ
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.biomedcentral.com/search/
  providerName: BioMedCentral
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: DOA
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M~E
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: P5Z
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Biological Science Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M7P
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/biologicalscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: K7-
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: 7X7
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: BENPR
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Publicly Available Content Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: PIMPY
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RSV
  dateStart: 20001201
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3ri9QwEB-8OwW_-H5UzyWKICjFPvP4eModHuJS9lROv4Q2bfYW1vbY7t5x_70zaXexiwoKJV9m0qbJZCbJTH4D8DLkuOqITeRHZWX8hHPryziMfZvnQkUWLZjskk2I8Vienqqsv8fdrqPd1y5Jp6ndtJb8bRsS1hpufVM_SAmTdAf20NpJmo2Tk68b1wGB9Pfuy99WGxigbTX8ix3ajpHccpQ6-3N0-79afgdu9ctNdtDJx124VtX34EaXgPLqPly5nJgULeQGiDWWFbOmB1MlAGdGcVt23ly2jCLkp-zSHaVWJbvoDtpaqnNO8tef7LK8Lhkl2HYuCpQAZuYrQmNwtPm0WcyWZz_aB_Dl6PDz-w9-n4zBNzxQS98KU6VlKZIiD0xkKxtZ3Gup0ChlTR4UaVpwW5WiCEPOVSXiQiQW1UVuw1IZ1AIPYbdu6uoxsAjVTI4LQWOkSGRUFLHCN4ZFZBXux5X1IFiPkDY9UjklzJhrt2ORXHc9qrFHNfWoDj14valy3sF0_I35BQ27JviLmuJrpvmqbfXxyUQfpGjUUxnwwINXPZNt8OMm768r4C8QYtaAc3_AifPTDMjP19KliURBbXXVrFpNF0C4lKmIPXjUSdum8VGK2j-JEw_EQA43DAQLPqTUszMHD44L4ASthgdv1tKoe73U_rlPnvwT91O4GTlxjvHZh93lYlU9g-vmYjlrFyPYEafClXIEe-8Ox9lk5M46sPwo_BHF12ZYZul3pGfHn7JvIzeHfwJtjD0r
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9QwDI9ggOCF74_CgICQkEAVbdrm43FCTJsYJ7QNtLeoTZvbSUc7Xe427b_HTtMTPQESSH2z06aJYzux8zMhb1IOXkdmWMzqxsQ55zaWWZrFtiyFYhYsmOyLTYjJRJ6cqK_hHrcbst2HkKTX1H5ZS_7BpYi1BlvfIk4KxCS9Sq7lYLAwj-_w6Ps6dIAg_SF8-dtmIwO0qYZ_sUObOZIbgVJvf3bv_FfP75Lbwd2kO7183CNXmvY-udEXoLx8QC59TUzMFvITRDtLq1kXwFQRwJli3paddxeOYob8lF74o9Smpuf9QZvDNmcof-Fkl5ZtTbHAtg9RgARQM18hGoOnzafdYrY8_eEekm-7n44_7sWhGENseKKWsRWmKepa5FWZGGYbyyzstVRqlLKmTKqiqLhtalGlKeeqEVklcgvqorRprQxogUdkq-3a5gmhDNRMCY6gMVLkklVVpuCNacWsgv24shFJhhnSJiCVY8GMufY7Fsl1P6IaRlTjiOo0Iu_WTc56mI6_Mb_GadcIf9Fifs20XDmn948O9U4BRr2QCU8i8jYw2Q4-bspwXQF-ARGzRpzbI05Yn2ZEfjVIl0YSJrW1TbdyGi-AcCkLkUXkcS9t686zArR_nuURESM5XDMgLPiY0s5OPTw4OMA5WI2IvB-kUQe95P48Jk__ifslubl3_OVAH-xPPj8jt5gX7QyebbK1XKya5-S6OV_O3OKFX5s_AW4nNzo
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3ri9QwEA96PvCLb--qp0YRBKVcm7Z5fDzUxUNZDk_lvoU2TfYW1nbZdO-4_95Mmi52UUGEfsukj_SXmUxm8huEXqXUrToyRWJSaxXnlJqYZ2kWm7JkghhnwXhfbIJNp_z0VByHOqd2yHYfQpL9mQZgaWq6g2Vt-inO6YFNgXfNucFFnBTAT3oVXcuhZhC46yffN2EEIOwPoczfdhsZo22V_ItN2s6X3Aqaels0ufPfX3EX3Q7LUHzY4-YeuqKb--hGX5jy8gG69LUyIYvI_zjcGlzN20CyCsTOGPK5zKK9sBgy52f4wm-x6hqf9xtwFvosAZdhxxeXTY2h8LYPXThkYLVYA0uDb1vM2tW8O_thH6Jvkw9f332MQ5GGWNFEdLFhShd1zfKqTBQx2hDjfDCRKiGMKpOqKCpqdM2qNKVUaJZVLDdOjZQmrYVy2uER2mnaRu8hTJz6Kd0CUSnOck6qKhPujmlFjHB-ujARSoa_JVVgMIdCGgvpPRlOZT-i0o2ohBGVaYTebLose_qOvwm_BAhIoMVoIO9mVq6tlUcnX-Rh4Yx9wROaROh1EDKte7gqwzEG9wnApDWS3B9JunmrRs0vBqRJaIJkt0a3ayvhYAjlvGBZhHZ75G1enhTOKuRZHiE2wuRGAOjCxy3N_MzThruFce6sSYTeDsiUQV_ZP4_J43-Sfo5uHr-fyM9H009P0C3ikZ25ax_tdKu1foquq_NublfP_DT9CU3wQB4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Classification+of+bioinformatics+workflows+using+weighted+versions+of+partitioning+and+hierarchical+clustering+algorithms&rft.jtitle=BMC+bioinformatics&rft.au=Lord%2C+Etienne&rft.au=Diallo%2C+Abdoulaye+Banir%C3%B1&rft.au=Makarenkov%2C+Vladimir&rft.date=2015-03-03&rft.pub=BioMed+Central+Ltd&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=16&rft_id=info:doi/10.1186%2Fs12859-015-0508-1&rft.externalDocID=A541358060
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon