Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms
Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics m...
Uložené v:
| Vydané v: | BMC bioinformatics Ročník 16; číslo 1; s. 68 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
London
BioMed Central
03.03.2015
BioMed Central Ltd |
| Predmet: | |
| ISSN: | 1471-2105, 1471-2105 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Background
Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search.
Results
In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (
e.g.
associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of
k
-means and
k
-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced.
Conclusions
Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the
Armadillo
and
Taverna
scientific workflow management system, we found that the weighted cosine distance in association with the
k-
medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support. |
|---|---|
| AbstractList | Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support. BACKGROUNDWorkflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search.RESULTSIn this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced.CONCLUSIONSOur findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support. Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support. Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. Results In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes (e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k-means and k-medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Conclusions Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k-medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support. Keywords: Bioinformatics workflows, Hierarchical clustering, k-means partitioning, Scientific workflows, Workflow clustering Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields, including computational biology. For example, simulation studies, which are now a must for statistical validation of new bioinformatics methods and software, are frequently carried out using the available workflow platforms. Workflows are typically organized to minimize the total execution time and to maximize the efficiency of the included operations. Clustering algorithms can be applied either for regrouping similar workflows for their simultaneous execution on a server, or for dispatching some lengthy workflows to different servers, or for classifying the available workflows with a view to performing a specific keyword search. Results In this study, we consider four different workflow encoding and clustering schemes which are representative for bioinformatics projects. Some of them allow for clustering workflows with similar topological features, while the others regroup workflows according to their specific attributes ( e.g. associated keywords) or execution time. The four types of workflow encoding examined in this study were compared using the weighted versions of k -means and k -medoids partitioning algorithms. The Calinski-Harabasz, Silhouette and logSS clustering indices were considered. Hierarchical classification methods, including the UPGMA, Neighbor Joining, Fitch and Kitsch algorithms, were also applied to classify bioinformatics workflows. Moreover, a novel pairwise measure of clustering solution stability, which can be computed in situations when a series of independent program runs is carried out, was introduced. Conclusions Our findings based on the analysis of 220 real-life bioinformatics workflows suggest that the weighted clustering models based on keywords information or tasks execution times provide the most appropriate clustering solutions. Using datasets generated by the Armadillo and Taverna scientific workflow management system, we found that the weighted cosine distance in association with the k- medoids partitioning algorithm and the presence-absence workflow encoding provided the highest values of the Rand index among all compared clustering strategies. The introduced clustering stability indices, PS and PSG, can be effectively used to identify elements with a low clustering support. |
| ArticleNumber | 68 |
| Audience | Academic |
| Author | Lord, Etienne Makarenkov, Vladimir Diallo, Abdoulaye Baniré |
| Author_xml | – sequence: 1 givenname: Etienne surname: Lord fullname: Lord, Etienne organization: Département d’informatique, Université du Québec à Montréal, Département de sciences biologiques, Université à Montréal – sequence: 2 givenname: Abdoulaye Baniré surname: Diallo fullname: Diallo, Abdoulaye Baniré organization: Département d’informatique, Université du Québec à Montréal – sequence: 3 givenname: Vladimir surname: Makarenkov fullname: Makarenkov, Vladimir email: makarenkov.vladimir@uqam.ca organization: Département d’informatique, Université du Québec à Montréal |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/25887434$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9ks1q3DAUhU1JaX7aB-imGLppFk4l27LkTSEMTRsIFPqzFrJ85VEqW1NdOdO8feVMUjKlBC9spO87SD73ODuY_ARZ9pqSM0pF8x5pKVhbEMoKwogo6LPsiNacFiUl7ODR92F2jHhNCOWCsBfZYcmE4HVVH2W3K6cQrbFaReun3Ju8s95OxocxrWjMtz78NM5vMZ_RTkO-BTusI_T5DQRMCi7ORoVol4CFUFOfry0EFfQ65bpcuxkjhLs9N_hg43rEl9lzoxzCq_v3Sfbj4uP31efi6suny9X5VaEb0sbCcA2s73ndKaJLA6Y0gtOW6rY1WpGOsa4x0POO0qZpgVcdr42oSmVo32reVifZh13uZu5G6DVMMSgnN8GOKtxKr6zc35nsWg7-RtYVq3lTpYB39wHB_5oBoxwtanBOTeBnlLThdSME4wv6docOyoFc_mJK1Asuz1lNKyZIQxJ19h8qPT2MVqeKjU3re8LpnpCYCL_joGZEefnt6z775vF1_97zofIE8B2gg0cMYKS28a78dArrJCVyGS65Gy6ZhksuwyVpMuk_5kP4U065c3Cz9A9BXvs5TKnwJ6Q_YjjkJQ |
| CitedBy_id | crossref_primary_10_1016_j_commatsci_2025_114019 crossref_primary_10_1016_j_culher_2016_03_008 crossref_primary_10_1016_j_heliyon_2024_e41488 crossref_primary_10_1016_j_ins_2017_02_010 crossref_primary_10_1038_s41598_018_29732_9 crossref_primary_10_1061__ASCE_EI_1943_5541_0000325 crossref_primary_10_2196_30890 crossref_primary_10_1186_s12575_018_0067_8 crossref_primary_10_3389_fmed_2025_1503229 crossref_primary_10_1007_s12633_025_03338_z |
| Cites_doi | 10.1093/nar/gks485 10.1007/978-3-540-73560-1_15 10.1093/nar/gkq429 10.1101/gr.4086505 10.1007/978-3-540-28651-6_25 10.1093/molbev/msr121 10.1007/11751588_40 10.1109/CCGrid.2012.109 10.1016/j.jmva.2007.07.002 10.1007/978-3-540-89965-5_18 10.1109/SCC.2010.95 10.1016/j.cosrev.2007.05.001 10.1111/j.1558-5646.1984.tb00255.x 10.1016/0025-5564(81)90043-2 10.1007/978-1-84628-757-2_19 10.1007/s00357-001-0018-x 10.1016/j.csda.2006.11.025 10.1109/MS.2008.92 10.1016/0377-0427(87)90125-7 10.1002/9780470316801 10.1109/ICALT.2001.943942 10.1016/j.csda.2011.09.003 10.1145/1341811.1341822 10.1016/j.drudis.2008.06.005 10.1080/01621459.1971.10482356 10.1109/SERVICES.2012.15 10.7155/jgaa.00139 10.1109/TSC.2010.6 10.1002/cpe.3003 10.1007/BF02294245 10.1093/biomet/asq061 10.1080/03610928308827180 10.1007/BF01246105 10.1126/science.155.3760.279 10.1348/000711007X184849 10.1371/journal.pone.0029903 10.1016/j.patcog.2012.07.021 10.1109/WORKS.2008.4723958 |
| ContentType | Journal Article |
| Copyright | Lord et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated. COPYRIGHT 2015 BioMed Central Ltd. Lord et al.; licensee BioMed Central. 2015 |
| Copyright_xml | – notice: Lord et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated. – notice: COPYRIGHT 2015 BioMed Central Ltd. – notice: Lord et al.; licensee BioMed Central. 2015 |
| DBID | C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM ISR 7X8 5PM |
| DOI | 10.1186/s12859-015-0508-1 |
| DatabaseName | Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Science MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1471-2105 |
| EndPage | 68 |
| ExternalDocumentID | PMC4354763 A541358060 25887434 10_1186_s12859_015_0508_1 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GroupedDBID | --- 0R~ 23N 2WC 4.4 53G 5VS 6J9 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACUHS ADBBV ADMLS ADRAZ ADUKV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHBYD AHMBA AHSBF AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EJD EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 H13 HCIFZ HMCUK HYE IAO ICD IHR INH INR ISR ITC K6V K7- KQ8 LK8 M1P M48 M7P MK~ ML0 M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PUEGO RBZ RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XH6 XSB AAYXX AFFHD CITATION ALIPV CGR CUY CVF ECM EIF NPM 7X8 5PM |
| ID | FETCH-LOGICAL-c609t-f7ce5dd74ba0c2fef2f87191c99fca0b55b6fed7b11669e73b74f832af1d9c793 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000350619800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1471-2105 |
| IngestDate | Tue Nov 04 01:45:51 EST 2025 Fri Sep 05 06:55:53 EDT 2025 Tue Nov 11 11:02:36 EST 2025 Tue Nov 04 18:20:29 EST 2025 Thu Nov 13 16:40:19 EST 2025 Mon Jul 21 06:06:01 EDT 2025 Sat Nov 29 07:54:55 EST 2025 Tue Nov 18 22:35:09 EST 2025 Sat Sep 06 07:27:17 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | means partitioning Scientific workflows Bioinformatics workflows Hierarchical clustering Workflow clustering |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c609t-f7ce5dd74ba0c2fef2f87191c99fca0b55b6fed7b11669e73b74f832af1d9c793 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://link.springer.com/10.1186/s12859-015-0508-1 |
| PMID | 25887434 |
| PQID | 1674688573 |
| PQPubID | 23479 |
| PageCount | 1 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_4354763 proquest_miscellaneous_1674688573 gale_infotracmisc_A541358060 gale_infotracacademiconefile_A541358060 gale_incontextgauss_ISR_A541358060 pubmed_primary_25887434 crossref_citationtrail_10_1186_s12859_015_0508_1 crossref_primary_10_1186_s12859_015_0508_1 springer_journals_10_1186_s12859_015_0508_1 |
| PublicationCentury | 2000 |
| PublicationDate | 20150303 2015-03-03 2015-Mar-03 |
| PublicationDateYYYYMMDD | 2015-03-03 |
| PublicationDate_xml | – month: 3 year: 2015 text: 20150303 day: 3 |
| PublicationDecade | 2010 |
| PublicationPlace | London |
| PublicationPlace_xml | – name: London – name: England |
| PublicationTitle | BMC bioinformatics |
| PublicationTitleAbbrev | BMC Bioinformatics |
| PublicationTitleAlternate | BMC Bioinformatics |
| PublicationYear | 2015 |
| Publisher | BioMed Central BioMed Central Ltd |
| Publisher_xml | – name: BioMed Central – name: BioMed Central Ltd |
| References | M Kastner (508_CR23) 2009 J MacQueen (508_CR15) 1967 A Wombacher (508_CR22) 2006 T Caliński (508_CR17) 1974; 3 C Hennig (508_CR45) 2008; 99 Y Fang (508_CR49) 2012; 56 D Steinley (508_CR47) 2008; 61 WM Rand (508_CR40) 1971; 66 JY Jung (508_CR24) 2006 508_CR38 PJ Rousseeuw (508_CR19) 1987; 20 YL Tsai (508_CR12) 2012 J Wang (508_CR48) 2010; 97 D Grigori (508_CR9) 2010; 3 AP Reynolds (508_CR29) 2004 508_CR1 GW Milligan (508_CR46) 1996; 13 508_CR3 V Silva (508_CR27) 2011; 2 508_CR4 M Rahman (508_CR10) 2013; 25 JA Hartigan (508_CR18) 1975 DF Robinson (508_CR41) 1981; 53 D Woollard (508_CR5) 2008; 25 V Makarenkov (508_CR30) 2001; 18 N Saitou (508_CR34) 1987; 4 A Wombacher (508_CR21) 2010 K Tamura (508_CR43) 2011; 28 W Chen (508_CR14) 2013 R Sokal (508_CR33) 1958; 38 C Hennig (508_CR44) 2007; 52 B Giardine (508_CR2) 2005; 15 508_CR13 O Arbelaitz (508_CR32) 2013; 46 508_CR11 LR Kaufman (508_CR16) 1990 CA Goble (508_CR39) 2010; 38 D Conte (508_CR25) 2007; 11 HH Bock (508_CR28) 2007 J Felsenstein (508_CR37) 2004 WM Fitch (508_CR35) 1967; 155 J Felsenstein (508_CR36) 1989; 5 F Costa (508_CR6) 2012 E Santos (508_CR20) 2008 E Lord (508_CR8) 2012; 7 A Boc (508_CR42) 2012; 40 SA Beaulah (508_CR7) 2008; 13 SE Schaeffer (508_CR26) 2007; 1 GW Milligan (508_CR31) 1985; 50 |
| References_xml | – volume: 5 start-page: 164 year: 1989 ident: 508_CR36 publication-title: Cladistics – start-page: 104 volume-title: Third International Workshop on Resource Discovery - RED 2010: 5 November 2010; Paris year: 2012 ident: 508_CR6 – volume: 40 start-page: W573 issue: W1 year: 2012 ident: 508_CR42 publication-title: Nucl Acids Res doi: 10.1093/nar/gks485 – start-page: 161 volume-title: Selected Contributions in Data Analysis and Classification year: 2007 ident: 508_CR28 doi: 10.1007/978-3-540-73560-1_15 – volume: 38 start-page: W677 issue: suppl 2 year: 2010 ident: 508_CR39 publication-title: Nucl Acids Res doi: 10.1093/nar/gkq429 – volume: 15 start-page: 1451 year: 2005 ident: 508_CR2 publication-title: Genome Res doi: 10.1101/gr.4086505 – start-page: 173 volume-title: Proceedings of the 5th International Conference on Intelligent Data Engineering and Automated Learning–IDEAL 2004: 25-27 August 2004; Exeter, UK year: 2004 ident: 508_CR29 doi: 10.1007/978-3-540-28651-6_25 – volume: 28 start-page: 2731 year: 2011 ident: 508_CR43 publication-title: Mol Biol Evol doi: 10.1093/molbev/msr121 – start-page: 188 volume-title: 9th International Conference on eScience: 22-25 October 2013; Beijing year: 2013 ident: 508_CR14 – start-page: 379 volume-title: Proceedings of the International Conference on Computational Science and Its Applications - ICCSA 2006, Part II: 8-11 May 2006; Glasgow, UK year: 2006 ident: 508_CR24 doi: 10.1007/11751588_40 – ident: 508_CR11 doi: 10.1109/CCGrid.2012.109 – volume: 99 start-page: 1154 year: 2008 ident: 508_CR45 publication-title: J Multivar Anal doi: 10.1016/j.jmva.2007.07.002 – start-page: 160 volume-title: Provenance and Annotation of Data and Processes, Second International Provenance and Annotation Workshop: 17-18 June 2008; Salt Lake City year: 2008 ident: 508_CR20 doi: 10.1007/978-3-540-89965-5_18 – start-page: 337 volume-title: Proceedinds of the IEEE International Conference on Services Computing (SCC): 5-10 July 2010; Miami year: 2010 ident: 508_CR21 doi: 10.1109/SCC.2010.95 – volume: 1 start-page: 27 year: 2007 ident: 508_CR26 publication-title: Comp Sci Rev doi: 10.1016/j.cosrev.2007.05.001 – ident: 508_CR38 doi: 10.1111/j.1558-5646.1984.tb00255.x – volume: 53 start-page: 131 year: 1981 ident: 508_CR41 publication-title: Math Biosc doi: 10.1016/0025-5564(81)90043-2 – ident: 508_CR4 doi: 10.1007/978-1-84628-757-2_19 – volume: 18 start-page: 245 year: 2001 ident: 508_CR30 publication-title: J Classif doi: 10.1007/s00357-001-0018-x – volume: 4 start-page: 406 year: 1987 ident: 508_CR34 publication-title: Mol Biol Evol – volume-title: Inferring phylogenies year: 2004 ident: 508_CR37 – volume: 52 start-page: 258 year: 2007 ident: 508_CR44 publication-title: Comput Stat Data Anal doi: 10.1016/j.csda.2006.11.025 – volume: 25 start-page: 37 year: 2008 ident: 508_CR5 publication-title: Software doi: 10.1109/MS.2008.92 – volume: 20 start-page: 53 year: 1987 ident: 508_CR19 publication-title: J Comp Appl Math doi: 10.1016/0377-0427(87)90125-7 – volume-title: Finding groups in data: An introduction to cluster analysis year: 1990 ident: 508_CR16 doi: 10.1002/9780470316801 – ident: 508_CR1 doi: 10.1109/ICALT.2001.943942 – volume: 56 start-page: 468 year: 2012 ident: 508_CR49 publication-title: Comput Stat Data Anal doi: 10.1016/j.csda.2011.09.003 – ident: 508_CR13 doi: 10.1145/1341811.1341822 – volume: 13 start-page: 771 year: 2008 ident: 508_CR7 publication-title: Drug Discov Today doi: 10.1016/j.drudis.2008.06.005 – volume: 38 start-page: 1409 year: 1958 ident: 508_CR33 publication-title: Univ Kansas Sci Bull – volume: 66 start-page: 846 year: 1971 ident: 508_CR40 publication-title: J Am Stat Assoc doi: 10.1080/01621459.1971.10482356 – start-page: 737 volume-title: 12th International Conference on Computer Aided Systems Theory - EUROCAST 2000: 15-20 February 2009; Las Palmas de Gran Canaria year: 2009 ident: 508_CR23 – start-page: 1 volume-title: IEEE Eighth World Congress on Services: 24-29 June 2012; Honolulu, HI year: 2012 ident: 508_CR12 doi: 10.1109/SERVICES.2012.15 – volume: 11 start-page: 99 year: 2007 ident: 508_CR25 publication-title: J Graph Algorithms Appl doi: 10.7155/jgaa.00139 – volume: 3 start-page: 178 year: 2010 ident: 508_CR9 publication-title: IEEE T Serv Comput doi: 10.1109/TSC.2010.6 – start-page: 255 volume-title: Proceedings of the OTM Confederated International Conferences, CoopIS, DOA, GADA, and ODBASE 2006, Part I: 29 October to 3 November 2006; Montpellier year: 2006 ident: 508_CR22 – start-page: 281 volume-title: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: 21 June to 18 July 18, 1965 and 27 December 1965 to 7 January 1966, Berkeley year: 1967 ident: 508_CR15 – volume: 25 start-page: 1816 year: 2013 ident: 508_CR10 publication-title: Concurr Comp-Pract E doi: 10.1002/cpe.3003 – volume: 50 start-page: 159 year: 1985 ident: 508_CR31 publication-title: Psychometrika doi: 10.1007/BF02294245 – volume-title: Clustering algorithms year: 1975 ident: 508_CR18 – volume: 97 start-page: 893 year: 2010 ident: 508_CR48 publication-title: Biometrika doi: 10.1093/biomet/asq061 – volume: 3 start-page: 1 year: 1974 ident: 508_CR17 publication-title: Commun Stat Theory doi: 10.1080/03610928308827180 – volume: 13 start-page: 315 year: 1996 ident: 508_CR46 publication-title: J Classif doi: 10.1007/BF01246105 – volume: 155 start-page: 279 year: 1967 ident: 508_CR35 publication-title: Science doi: 10.1126/science.155.3760.279 – volume: 61 start-page: 255 year: 2008 ident: 508_CR47 publication-title: Br J Math Stat Psych doi: 10.1348/000711007X184849 – volume: 2 start-page: 23 year: 2011 ident: 508_CR27 publication-title: J Comp Interdisc Sci – volume: 7 start-page: e29903 year: 2012 ident: 508_CR8 publication-title: PloS One doi: 10.1371/journal.pone.0029903 – volume: 46 start-page: 243 year: 2013 ident: 508_CR32 publication-title: Pattern Recogn doi: 10.1016/j.patcog.2012.07.021 – ident: 508_CR3 doi: 10.1109/WORKS.2008.4723958 |
| SSID | ssj0017805 |
| Score | 2.2239099 |
| Snippet | Background
Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific... Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific fields,... Background Workflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific... BACKGROUNDWorkflows, or computational pipelines, consisting of collections of multiple linked tasks are becoming more and more popular in many scientific... |
| SourceID | pubmedcentral proquest gale pubmed crossref springer |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 68 |
| SubjectTerms | Algorithms Bioinformatics Biomedical and Life Sciences Cluster Analysis Computational Biology - methods Computational Biology/Bioinformatics Computer Appl. in Life Sciences Datasets as Topic Life Sciences Microarrays Networks analysis Phylogeny Research Article Software Workflow |
| Title | Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms |
| URI | https://link.springer.com/article/10.1186/s12859-015-0508-1 https://www.ncbi.nlm.nih.gov/pubmed/25887434 https://www.proquest.com/docview/1674688573 https://pubmed.ncbi.nlm.nih.gov/PMC4354763 |
| Volume | 16 |
| WOSCitedRecordID | wos000350619800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVADU databaseName: BioMedCentral customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RBZ dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.biomedcentral.com/search/ providerName: BioMedCentral – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M~E dateStart: 20000101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: P5Z dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Biological Science Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M7P dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/biologicalscijournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: K7- dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: 7X7 dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: BENPR dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Publicly Available Content Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: PIMPY dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RSV dateStart: 20001201 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3ri9QwEB-8OwW_-H5UzyWKICjFPvP4eModHuJS9lROv4Q2bfYW1vbY7t5x_70zaXexiwoKJV9m0qbJZCbJTH4D8DLkuOqITeRHZWX8hHPryziMfZvnQkUWLZjskk2I8Vienqqsv8fdrqPd1y5Jp6ndtJb8bRsS1hpufVM_SAmTdAf20NpJmo2Tk68b1wGB9Pfuy99WGxigbTX8ix3ajpHccpQ6-3N0-79afgdu9ctNdtDJx124VtX34EaXgPLqPly5nJgULeQGiDWWFbOmB1MlAGdGcVt23ly2jCLkp-zSHaVWJbvoDtpaqnNO8tef7LK8Lhkl2HYuCpQAZuYrQmNwtPm0WcyWZz_aB_Dl6PDz-w9-n4zBNzxQS98KU6VlKZIiD0xkKxtZ3Gup0ChlTR4UaVpwW5WiCEPOVSXiQiQW1UVuw1IZ1AIPYbdu6uoxsAjVTI4LQWOkSGRUFLHCN4ZFZBXux5X1IFiPkDY9UjklzJhrt2ORXHc9qrFHNfWoDj14valy3sF0_I35BQ27JviLmuJrpvmqbfXxyUQfpGjUUxnwwINXPZNt8OMm768r4C8QYtaAc3_AifPTDMjP19KliURBbXXVrFpNF0C4lKmIPXjUSdum8VGK2j-JEw_EQA43DAQLPqTUszMHD44L4ASthgdv1tKoe73U_rlPnvwT91O4GTlxjvHZh93lYlU9g-vmYjlrFyPYEafClXIEe-8Ox9lk5M46sPwo_BHF12ZYZul3pGfHn7JvIzeHfwJtjD0r |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9QwDI9ggOCF74_CgICQkEAVbdrm43FCTJsYJ7QNtLeoTZvbSUc7Xe427b_HTtMTPQESSH2z06aJYzux8zMhb1IOXkdmWMzqxsQ55zaWWZrFtiyFYhYsmOyLTYjJRJ6cqK_hHrcbst2HkKTX1H5ZS_7BpYi1BlvfIk4KxCS9Sq7lYLAwj-_w6Ps6dIAg_SF8-dtmIwO0qYZ_sUObOZIbgVJvf3bv_FfP75Lbwd2kO7183CNXmvY-udEXoLx8QC59TUzMFvITRDtLq1kXwFQRwJli3paddxeOYob8lF74o9Smpuf9QZvDNmcof-Fkl5ZtTbHAtg9RgARQM18hGoOnzafdYrY8_eEekm-7n44_7sWhGENseKKWsRWmKepa5FWZGGYbyyzstVRqlLKmTKqiqLhtalGlKeeqEVklcgvqorRprQxogUdkq-3a5gmhDNRMCY6gMVLkklVVpuCNacWsgv24shFJhhnSJiCVY8GMufY7Fsl1P6IaRlTjiOo0Iu_WTc56mI6_Mb_GadcIf9Fifs20XDmn948O9U4BRr2QCU8i8jYw2Q4-bspwXQF-ARGzRpzbI05Yn2ZEfjVIl0YSJrW1TbdyGi-AcCkLkUXkcS9t686zArR_nuURESM5XDMgLPiY0s5OPTw4OMA5WI2IvB-kUQe95P48Jk__ifslubl3_OVAH-xPPj8jt5gX7QyebbK1XKya5-S6OV_O3OKFX5s_AW4nNzo |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3ri9QwEA96PvCLb--qp0YRBKVcm7Z5fDzUxUNZDk_lvoU2TfYW1nbZdO-4_95Mmi52UUGEfsukj_SXmUxm8huEXqXUrToyRWJSaxXnlJqYZ2kWm7JkghhnwXhfbIJNp_z0VByHOqd2yHYfQpL9mQZgaWq6g2Vt-inO6YFNgXfNucFFnBTAT3oVXcuhZhC46yffN2EEIOwPoczfdhsZo22V_ItN2s6X3Aqaels0ufPfX3EX3Q7LUHzY4-YeuqKb--hGX5jy8gG69LUyIYvI_zjcGlzN20CyCsTOGPK5zKK9sBgy52f4wm-x6hqf9xtwFvosAZdhxxeXTY2h8LYPXThkYLVYA0uDb1vM2tW8O_thH6Jvkw9f332MQ5GGWNFEdLFhShd1zfKqTBQx2hDjfDCRKiGMKpOqKCpqdM2qNKVUaJZVLDdOjZQmrYVy2uER2mnaRu8hTJz6Kd0CUSnOck6qKhPujmlFjHB-ujARSoa_JVVgMIdCGgvpPRlOZT-i0o2ohBGVaYTebLose_qOvwm_BAhIoMVoIO9mVq6tlUcnX-Rh4Yx9wROaROh1EDKte7gqwzEG9wnApDWS3B9JunmrRs0vBqRJaIJkt0a3ayvhYAjlvGBZhHZ75G1enhTOKuRZHiE2wuRGAOjCxy3N_MzThruFce6sSYTeDsiUQV_ZP4_J43-Sfo5uHr-fyM9H009P0C3ikZ25ax_tdKu1foquq_NublfP_DT9CU3wQB4 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Classification+of+bioinformatics+workflows+using+weighted+versions+of+partitioning+and+hierarchical+clustering+algorithms&rft.jtitle=BMC+bioinformatics&rft.au=Lord%2C+Etienne&rft.au=Diallo%2C+Abdoulaye+Banir%C3%B1&rft.au=Makarenkov%2C+Vladimir&rft.date=2015-03-03&rft.pub=BioMed+Central+Ltd&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=16&rft_id=info:doi/10.1186%2Fs12859-015-0508-1&rft.externalDocID=A541358060 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon |