Challenges of Large-Scale Biomedical Workflows on the Cloud -- A Case Study on the Need for Reproducibility of Results

Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - IEEE Symposium on Computer-Based Medical Systems pp. 220 - 225
Main Authors: Kanwal, Sehrish, Lonie, Andrew, Sinnott, Richard O., Anderson, Charlotte
Format: Conference Proceeding Journal Article
Language:English
Published: IEEE 01.06.2015
Subjects:
ISSN:1063-7125, 2372-9198
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical environment. To cope with the complex computational demands of huge biological datasets, a shift to distributed compute resources is unavoidable. A case study was conducted in which three well established bioinformatics analysis groups across Australia were assigned to analyse exome sequence data from a range of patients with a rare condition: disorder of sex development. Initially these groups used their own in-house data processing pipelines, and subsequently used a common bioinformatics workbench based upon Galaxy and offered through the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud. This paper describes the experiences in this work and the variability of results. We put forward principles that should be used to ensure reproducibility of scientific results moving forward.
AbstractList Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical environment. To cope with the complex computational demands of huge biological datasets, a shift to distributed compute resources is unavoidable. A case study was conducted in which three well established bioinformatics analysis groups across Australia were assigned to analyse exome sequence data from a range of patients with a rare condition: disorder of sex development. Initially these groups used their own in-house data processing pipelines, and subsequently used a common bioinformatics workbench based upon Galaxy and offered through the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud. This paper describes the experiences in this work and the variability of results. We put forward principles that should be used to ensure reproducibility of scientific results moving forward.
Author Sinnott, Richard O.
Anderson, Charlotte
Lonie, Andrew
Kanwal, Sehrish
Author_xml – sequence: 1
  givenname: Sehrish
  surname: Kanwal
  fullname: Kanwal, Sehrish
  email: skanwal@student.unimelb.edu.au
  organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 2
  givenname: Andrew
  surname: Lonie
  fullname: Lonie, Andrew
  email: alonie@unimelb.edu.au
  organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 3
  givenname: Richard O.
  surname: Sinnott
  fullname: Sinnott, Richard O.
  email: rsinnott@unimelb.edu.au
  organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia
– sequence: 4
  givenname: Charlotte
  surname: Anderson
  fullname: Anderson, Charlotte
  email: charlotte.anderson@unimelb.edu.au
  organization: Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia
BookMark eNo1jj1PwzAYhA0qEm1hY2PxyJLir8T22EZ8SQWkFsRYJfHr1uDGJU5A_fcEFaY76R7d3QgN6lADQheUTCgl-jqfPS4njNB0wtQRGlGRSZ5JmfJjNGRcskRTrQZoSEnGE0lZeopGMb4TQoSkfIi-8k3hPdRriDhYPC-aNSTLqvCAZy5swbje47fQfFgfvnumxu0GcO5DZ3CS4CnOiwh42XZm_x8-ARhsQ4MXsGuC6SpXOu_a_e_AAmLn23iGTmzhI5z_6Ri93t685PfJ_PnuIZ_OE8eIahOmKa9Ag6gyy4mslGFCMVaWJZeW65IDSdNCKZsabVgmRAq8NJWxRnAqrOBjdHXo7Y98dhDb1dbFCrwvaghdXFFJlU4zpXmPXh5QBwCrXeO2RbNfSZpJoQn_AWzta34
CODEN IEEPAD
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/CBMS.2015.28
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISBN 1467367753
9781467367752
EISSN 2372-9198
EndPage 225
ExternalDocumentID 7167490
Genre orig-research
GroupedDBID 29O
6IE
6IH
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CBEJK
M43
RIE
RIO
29F
6IL
6IN
7SC
8FD
AAWTH
ADZIZ
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CHZPO
IEGSK
IJVOP
JQ2
L7M
L~C
L~D
OCL
RIL
ID FETCH-LOGICAL-i208t-2913ce9e4c6f307c8d24822bbb37f39b3e055a88f5d9d26445e3bdcdfd4314f43
IEDL.DBID RIE
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000369099700047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1063-7125
IngestDate Fri Jul 11 16:52:30 EDT 2025
Wed Aug 27 01:45:55 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i208t-2913ce9e4c6f307c8d24822bbb37f39b3e055a88f5d9d26445e3bdcdfd4314f43
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
PQID 1718956893
PQPubID 23500
PageCount 6
ParticipantIDs proquest_miscellaneous_1718956893
ieee_primary_7167490
PublicationCentury 2000
PublicationDate 20150601
PublicationDateYYYYMMDD 2015-06-01
PublicationDate_xml – month: 06
  year: 2015
  text: 20150601
  day: 01
PublicationDecade 2010
PublicationTitle Proceedings - IEEE Symposium on Computer-Based Medical Systems
PublicationTitleAbbrev CBMS
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0004713
ssib026764846
ssj0053155
Score 1.9768436
Snippet Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 220
SubjectTerms Australia
Bioinformatics
bioinformatics workflows
Computation
Data processing
Disorders
distributed compute resources
DNA
exome
Genomics
Laboratories
NeCTAR Research Cloud
Reproducibility
Sequential analysis
Software
Workbenches
Workflow
Title Challenges of Large-Scale Biomedical Workflows on the Cloud -- A Case Study on the Need for Reproducibility of Results
URI https://ieeexplore.ieee.org/document/7167490
https://www.proquest.com/docview/1718956893
WOSCitedRecordID wos000369099700047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF60iHjy0Yr1xQoe3drsI7t71GDxYIv4gN5Ksg8QSiKmUfrv3UmaetCLt4QkbJiZZL7d-b4dhC6FdXGcKkEojTzhJo5I6llEfGy9SQ01ivu62YScTNR0qh830NVaC-Ocq8lnbgCHdS3fFqaCpbJrCZR5HSbom1LKRqvVxg6NZcwVTG1aTaSMGnJ9zIgMWXxNetfXye34GUhdYgA92OumKr_-xHV6Ge3-78X2UO9Hp4cf1xloH224_ABtj1fl8i76TNpWKSUuPH4A1jd5Dl5x-LaW3YOHMKyX-3nxFe7JccCDOJkXlcWE4BuchCSHgWu4bC9OwmA4IF0coHu9W2xDr13CAE-urOaLsodeR3cvyT1ZNVogb3SoFoTqiBmnXXCVD9-8UZbyAByyLGPSM50xNxQiVcoLqy0gKOFYZo31NsAP7jk7RJ28yN0RwkM41ZGlymXcpyrzVKdCxJIJnnnB-6gL5pu9N3tpzFaW66OL1v6zEN9QtEhzV1TlLArJEySNmh3__egJ2gFfNvStU9RZfFTuDG2Zz8Vb-XFeB8k31hK7VQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF6KinryjfW5gkdXm30ku0cbFMW2iA_wFpJ9QKEk0jSK_96dpKkHvXhLSMKGmUnm253v20HoXBgbhqkUhNLAEa7DgKSOBcSFxulUUy25q5tNRKORfHtTjx10sdDCWGtr8pm9hMO6lm8KXcFS2VUElHnlJ-jLgnMaNGqtNnpoGIVcwuSmVUVGQUOvDxmJfB5f0N7VVdwfPgOtS1xCF_a6rcqvf3GdYG43_vdqm2j3R6mHHxc5aAt1bL6NVofzgvkO-ojbZiklLhweAO-bPHu_WNyvhffgIwwr5m5SfPp7cuwRIY4nRWUwIfgaxz7NYWAbfrUXR34w7LEu9uC93i-2Idh-wQBPtqwms3IXvd7evMR3ZN5qgYxpT84IVQHTVlnvLOe_ei0N5R46ZFnGIsdUxmxPiFRKJ4wygKGEZZnRxhkPQLjjbA8t5UVu9xHuwakKDJU24y6VmaMqFSKMmOCZE7yLdsB8yXuzm0Yyt1wXnbX2T3yEQ9kizW1RlUng0yeIGhU7-PvRU7R29zIcJIP70cMhWge_NmSuI7Q0m1b2GK3oj9m4nJ7UAfMNwV--nA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+Symposium+on+Computer-Based+Medical+Systems&rft.atitle=Challenges+of+Large-Scale+Biomedical+Workflows+on+the+Cloud+--+A+Case+Study+on+the+Need+for+Reproducibility+of+Results&rft.au=Kanwal%2C+Sehrish&rft.au=Lonie%2C+Andrew&rft.au=Sinnott%2C+Richard+O.&rft.au=Anderson%2C+Charlotte&rft.date=2015-06-01&rft.pub=IEEE&rft.issn=1063-7125&rft.spage=220&rft.epage=225&rft_id=info:doi/10.1109%2FCBMS.2015.28&rft.externalDocID=7167490
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-7125&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-7125&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-7125&client=summon