From Scripted HPC-Based NGS Pipelines to Workflows on the Cloud

In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based solution to a workflow enactment system running in the cloud. We discuss shortcomings of the existing approach based on scripts and list benefit...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing s. 694 - 700
Hlavní autori: Cala, Jacek, Yaobo Xu, Wijaya, Eldarina Azfar, Missier, Paolo
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.05.2014
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based solution to a workflow enactment system running in the cloud. We discuss shortcomings of the existing approach based on scripts and list benefits that a workflow-based solution can provide. Despite the effort it involved to wrap all required tools in the form of workflow blocks and the restrictions of the dataflow model used to represent workflows we expect the migration to significantly improve the current status of the pipeline. Our target is to enable flexibility, traceability and reproducibility of the solution, so that it can better fit the evolution of tools, data and pipeline itself and allow us to run it at national scale. This work will become foundation for the more complete system that includes variant filtering and interpretation for the diagnostic purposes.
AbstractList In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based solution to a workflow enactment system running in the cloud. We discuss shortcomings of the existing approach based on scripts and list benefits that a workflow-based solution can provide. Despite the effort it involved to wrap all required tools in the form of workflow blocks and the restrictions of the dataflow model used to represent workflows we expect the migration to significantly improve the current status of the pipeline. Our target is to enable flexibility, traceability and reproducibility of the solution, so that it can better fit the evolution of tools, data and pipeline itself and allow us to run it at national scale. This work will become foundation for the more complete system that includes variant filtering and interpretation for the diagnostic purposes.
Author Cala, Jacek
Wijaya, Eldarina Azfar
Missier, Paolo
Yaobo Xu
Author_xml – sequence: 1
  givenname: Jacek
  surname: Cala
  fullname: Cala, Jacek
  organization: Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK
– sequence: 2
  surname: Yaobo Xu
  fullname: Yaobo Xu
  email: yaobo.xu@newcastle.ac.uk
  organization: Inst. of Genetic Med., Newcastle Univ., Newcastle upon Tyne, UK
– sequence: 3
  givenname: Eldarina Azfar
  surname: Wijaya
  fullname: Wijaya, Eldarina Azfar
  organization: Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK
– sequence: 4
  givenname: Paolo
  surname: Missier
  fullname: Missier, Paolo
  organization: Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK
BookMark eNotzEFLwzAYgOEICrq5swcv-QOd-ZI0SU-iwXXC0MEUjyNtvmCwa0paEf_9Bnp639MzI-d96pGQG2BLAFbdWVvn6JecgVwCN2dkBlJXFddGmkuyGMfYMK60kgLMFblf5XSguzbHYUJP11tbPLrxdC_1jm7jgF3scaRToh8pf4Uu_Yw09XT6RGq79O2vyUVw3YiL_87J--rpza6LzWv9bB82heNST4X0iMiCDl5J7rRg6FtwUlSGAwuOew4BvSiD1p4zFxpVAmtagd6hkawVc3L758YTtB9yPLj8u1dGqpKDOALNxUjQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CCGrid.2014.128
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1479927848
9781479927845
EndPage 700
ExternalDocumentID 6846521
Genre orig-research
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-a247t-4deee0f7fd642a730edc1a4398210fa2d21fed35f77d20afb6510bc3edae840c3
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000361021000087&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Dec 20 05:20:10 EST 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-4deee0f7fd642a730edc1a4398210fa2d21fed35f77d20afb6510bc3edae840c3
PageCount 7
ParticipantIDs ieee_primary_6846521
PublicationCentury 2000
PublicationDate 2014-May
PublicationDateYYYYMMDD 2014-05-01
PublicationDate_xml – month: 05
  year: 2014
  text: 2014-May
PublicationDecade 2010
PublicationTitle 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
PublicationTitleAbbrev ccgrid
PublicationYear 2014
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026764318
Score 1.5887394
Snippet In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based...
SourceID ieee
SourceType Publisher
StartPage 694
SubjectTerms Bioinformatics
Cloud computing
Computational modeling
Genomics
Libraries
Pipelines
Title From Scripted HPC-Based NGS Pipelines to Workflows on the Cloud
URI https://ieeexplore.ieee.org/document/6846521
WOSCitedRecordID wos000361021000087&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVKxcAEqEV8ywMjbm3HidMJiYi2UxWpIHWrHPssVSpxlabw97GTUhhY2CwP_j49n33vHkIPkGpBrTaEgRZEeIAhaZIURChqi9iKwtJWbELOZuliMco76PHAhQGAJvgMBqHY_OUbp3fhqWyYeLCMA2v8SErZcrW-zw5PpMdWlu6z9zA6GmbZpFqFZKBMDFhQW_8ln9Kgx_j0f_2eof4PDQ_nB4A5Rx0oe-hpXLl3PG-MHQye5hl59khk8Gwyx_lqE_jlsMW1w-Ed3K7d5xa7Evt7Hs7Wbmf66G388ppNyV4FgSguZE2E8eOhVlrjXQXlDRKMZsrfI1LvrVnFDWcWTBRbKQ2nyhaJN7NCR2AUeO9NRxeoW7oSLhE2kkcmZKSPpRaap0WQKfKNUxgxrVRyhXph8stNm-hiuZ_39d_VN-gkrG0b_XeLunW1gzt0rD_q1ba6b3bnCyKBkW4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4ImuhJDRh_24NHC2vXreNk4iJgxGUJmHAjXfuakOBGYOi_b7shevDiremhv1--vvZ970PoDiLFPaM0oaA44RZgSBSGGeHSM1lgeGa8WmxCJEk0nfbSBrrfcWEAoAo-g44rVn_5ulAb91TWDS1YBo41vhdwzmjN1vo-PSwUFl1ptM3fQ71eN44Hq7lLB0p5hzq99V8CKhV-9I_-1_Mxav8Q8XC6g5gT1IC8hR76q-IdjytzB42HaUweLRZpnAzGOJ0vHcMc1rgssHsJN4vic42LHNubHo4XxUa30Vv_aRIPyVYHgUjGRUm4tuPxjDDaOgvSmiRoRaW9SUTWXzOSaUYNaD8wQmjmSZOF1tAy5YOWYP035Z-iZl7kcIawFszXLid9IBRXLMqcUJFt3IMeVVKG56jlJj9b1qkuZtt5X_xdfYsOhpPX0Wz0nLxcokO3znUs4BVqlqsNXKN99VHO16ubaqe-AOVUlLU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2014+14th+IEEE%2FACM+International+Symposium+on+Cluster%2C+Cloud+and+Grid+Computing&rft.atitle=From+Scripted+HPC-Based+NGS+Pipelines+to+Workflows+on+the+Cloud&rft.au=Cala%2C+Jacek&rft.au=Yaobo+Xu&rft.au=Wijaya%2C+Eldarina+Azfar&rft.au=Missier%2C+Paolo&rft.date=2014-05-01&rft.pub=IEEE&rft.spage=694&rft.epage=700&rft_id=info:doi/10.1109%2FCCGrid.2014.128&rft.externalDocID=6846521