Challenges of Large-Scale Biomedical Workflows on the Cloud -- A Case Study on the Need for Reproducibility of Results
Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical...
Saved in:
| Published in: | Proceedings - IEEE Symposium on Computer-Based Medical Systems pp. 220 - 225 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding Journal Article |
| Language: | English |
| Published: |
IEEE
01.06.2015
|
| Subjects: | |
| ISSN: | 1063-7125, 2372-9198 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical environment. To cope with the complex computational demands of huge biological datasets, a shift to distributed compute resources is unavoidable. A case study was conducted in which three well established bioinformatics analysis groups across Australia were assigned to analyse exome sequence data from a range of patients with a rare condition: disorder of sex development. Initially these groups used their own in-house data processing pipelines, and subsequently used a common bioinformatics workbench based upon Galaxy and offered through the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud. This paper describes the experiences in this work and the variability of results. We put forward principles that should be used to ensure reproducibility of scientific results moving forward. |
|---|---|
| AbstractList | Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical environment. To cope with the complex computational demands of huge biological datasets, a shift to distributed compute resources is unavoidable. A case study was conducted in which three well established bioinformatics analysis groups across Australia were assigned to analyse exome sequence data from a range of patients with a rare condition: disorder of sex development. Initially these groups used their own in-house data processing pipelines, and subsequently used a common bioinformatics workbench based upon Galaxy and offered through the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud. This paper describes the experiences in this work and the variability of results. We put forward principles that should be used to ensure reproducibility of scientific results moving forward. |
| Author | Sinnott, Richard O. Anderson, Charlotte Lonie, Andrew Kanwal, Sehrish |
| Author_xml | – sequence: 1 givenname: Sehrish surname: Kanwal fullname: Kanwal, Sehrish email: skanwal@student.unimelb.edu.au organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 2 givenname: Andrew surname: Lonie fullname: Lonie, Andrew email: alonie@unimelb.edu.au organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 3 givenname: Richard O. surname: Sinnott fullname: Sinnott, Richard O. email: rsinnott@unimelb.edu.au organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia – sequence: 4 givenname: Charlotte surname: Anderson fullname: Anderson, Charlotte email: charlotte.anderson@unimelb.edu.au organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia |
| BookMark | eNo1jj1PwzAYhA0qEm1hY2PxyJLir8T22EZ8SQWkFsRYJfHr1uDGJU5A_fcEFaY76R7d3QgN6lADQheUTCgl-jqfPS4njNB0wtQRGlGRSZ5JmfJjNGRcskRTrQZoSEnGE0lZeopGMb4TQoSkfIi-8k3hPdRriDhYPC-aNSTLqvCAZy5swbje47fQfFgfvnumxu0GcO5DZ3CS4CnOiwh42XZm_x8-ARhsQ4MXsGuC6SpXOu_a_e_AAmLn23iGTmzhI5z_6Ri93t685PfJ_PnuIZ_OE8eIahOmKa9Ag6gyy4mslGFCMVaWJZeW65IDSdNCKZsabVgmRAq8NJWxRnAqrOBjdHXo7Y98dhDb1dbFCrwvaghdXFFJlU4zpXmPXh5QBwCrXeO2RbNfSZpJoQn_AWzta34 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding Journal Article |
| DBID | 6IE 6IH CBEJK RIE RIO 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/CBMS.2015.28 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Medicine |
| EISBN | 1467367753 9781467367752 |
| EISSN | 2372-9198 |
| EndPage | 225 |
| ExternalDocumentID | 7167490 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IH ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIO 29F 6IL 6IN 7SC 8FD AAWTH ADZIZ BEFXN BFFAM BGNUA BKEBE BPEOZ CHZPO IEGSK IJVOP JQ2 L7M L~C L~D OCL RIL |
| ID | FETCH-LOGICAL-i208t-2913ce9e4c6f307c8d24822bbb37f39b3e055a88f5d9d26445e3bdcdfd4314f43 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000369099700047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1063-7125 |
| IngestDate | Fri Jul 11 16:52:30 EDT 2025 Wed Aug 27 01:45:55 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i208t-2913ce9e4c6f307c8d24822bbb37f39b3e055a88f5d9d26445e3bdcdfd4314f43 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2 |
| PQID | 1718956893 |
| PQPubID | 23500 |
| PageCount | 6 |
| ParticipantIDs | proquest_miscellaneous_1718956893 ieee_primary_7167490 |
| PublicationCentury | 2000 |
| PublicationDate | 20150601 |
| PublicationDateYYYYMMDD | 2015-06-01 |
| PublicationDate_xml | – month: 06 year: 2015 text: 20150601 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings - IEEE Symposium on Computer-Based Medical Systems |
| PublicationTitleAbbrev | CBMS |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0004713 ssib026764846 ssj0053155 |
| Score | 1.9768436 |
| Snippet | Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and... |
| SourceID | proquest ieee |
| SourceType | Aggregation Database Publisher |
| StartPage | 220 |
| SubjectTerms | Australia Bioinformatics bioinformatics workflows Computation Data processing Disorders distributed compute resources DNA exome Genomics Laboratories NeCTAR Research Cloud Reproducibility Sequential analysis Software Workbenches Workflow |
| Title | Challenges of Large-Scale Biomedical Workflows on the Cloud -- A Case Study on the Need for Reproducibility of Results |
| URI | https://ieeexplore.ieee.org/document/7167490 https://www.proquest.com/docview/1718956893 |
| WOSCitedRecordID | wos000369099700047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60iHjy0Yr1xQoe3drsI7t71GDxYIv4gN5Ksg8QSiKmUfrv3UmaetCLt4QkbJiZZL7d-b4dhC6FdXGcKkEojTzhJo5I6llEfGy9SQ01ivu62YScTNR0qh830NVaC-Ocq8lnbgCHdS3fFqaCpbJrCZR5HSbom1LKRqvVxg6NZcwVTG1aTaSMGnJ9zIgMWXxNetfXye34GUhdYgA92OumKr_-xHV6Ge3-78X2UO9Hp4cf1xloH224_ABtj1fl8i76TNpWKSUuPH4A1jd5Dl5x-LaW3YOHMKyX-3nxFe7JccCDOJkXlcWE4BuchCSHgWu4bC9OwmA4IF0coHu9W2xDr13CAE-urOaLsodeR3cvyT1ZNVogb3SoFoTqiBmnXXCVD9-8UZbyAByyLGPSM50xNxQiVcoLqy0gKOFYZo31NsAP7jk7RJ28yN0RwkM41ZGlymXcpyrzVKdCxJIJnnnB-6gL5pu9N3tpzFaW66OL1v6zEN9QtEhzV1TlLArJEySNmh3__egJ2gFfNvStU9RZfFTuDG2Zz8Vb-XFeB8k31hK7VQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF6KinryjfW5gkdXm30ku0cbFMW2iA_wFpJ9QKEk0jSK_96dpKkHvXhLSMKGmUnm253v20HoXBgbhqkUhNLAEa7DgKSOBcSFxulUUy25q5tNRKORfHtTjx10sdDCWGtr8pm9hMO6lm8KXcFS2VUElHnlJ-jLgnMaNGqtNnpoGIVcwuSmVUVGQUOvDxmJfB5f0N7VVdwfPgOtS1xCF_a6rcqvf3GdYG43_vdqm2j3R6mHHxc5aAt1bL6NVofzgvkO-ojbZiklLhweAO-bPHu_WNyvhffgIwwr5m5SfPp7cuwRIY4nRWUwIfgaxz7NYWAbfrUXR34w7LEu9uC93i-2Idh-wQBPtqwms3IXvd7evMR3ZN5qgYxpT84IVQHTVlnvLOe_ei0N5R46ZFnGIsdUxmxPiFRKJ4wygKGEZZnRxhkPQLjjbA8t5UVu9xHuwakKDJU24y6VmaMqFSKMmOCZE7yLdsB8yXuzm0Yyt1wXnbX2T3yEQ9kizW1RlUng0yeIGhU7-PvRU7R29zIcJIP70cMhWge_NmSuI7Q0m1b2GK3oj9m4nJ7UAfMNwV--nA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+Symposium+on+Computer-Based+Medical+Systems&rft.atitle=Challenges+of+Large-Scale+Biomedical+Workflows+on+the+Cloud+--+A+Case+Study+on+the+Need+for+Reproducibility+of+Results&rft.au=Kanwal%2C+Sehrish&rft.au=Lonie%2C+Andrew&rft.au=Sinnott%2C+Richard+O.&rft.au=Anderson%2C+Charlotte&rft.date=2015-06-01&rft.pub=IEEE&rft.issn=1063-7125&rft.spage=220&rft.epage=225&rft_id=info:doi/10.1109%2FCBMS.2015.28&rft.externalDocID=7167490 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-7125&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-7125&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-7125&client=summon |